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Abstract 

The Chow parameters of a Boolean function / : { — 1, 1}" — > {— 1, 1} are its n + 1 degree-0 and 
degree-1 Fourier coefficients. It has been known since 1961 HCho61||Tan61| that the (exact values of the) 
Chow parameters of any linear threshold function / uniquely specify / within the space of all Boolean 
functions, but until recently IIOS1 1] nothing was known about efficient algorithms for reconstructing f 
(exactly or approximately) from exact or approximate values of its Chow parameters. We refer to this 
reconstruction problem as the Chow Parameters Problem. 

Our main result is a new algorithm for the Chow Parameters Problem which, given (sufficiently 
accurate approximations to) the Chow parameters of any linear threshold function /, runs in time 0(n 2 ) ■ 
(1/ e)°( l ° e t 1 / 6 )) and with high probability outputs a representation of an LTF /' that is e-close to /. The 

only previous algorithm BOS 1 11 had running time poly(n) • 2 2 ° <1/e ' . 

As a byproduct of our approach, we show that for any linear threshold function / over { — 1, 1}", there 
is a linear threshold function /' which is e-close to / and has all weights that are integers at most y/n ■ 
(l/ e )0(iog (i/e))_ This significantly improves the best previous result of [DS09 1 which gave a poly(n) • 
2<5(1A ) weight bound, and is close to the known lower bound of max{ v /n, (l/e) sl ( 1 °s lo s( 1 / e ))} 
IIG0IO6I ISer07l . Our techniques also yield improved algorithms for related problems in learning the- 
ory. 

In addition to being significantly stronger than previous work, our results are obtained using concep- 
tually simpler proofs. The two main ingredients underlying our results are (1) a new structural result 
showing that for / any linear threshold function and g any bounded function, if the Chow parameters of 
/ are close to the Chow parameters of g then / is close to g; (2) a new boosting-like algorithm that given 
approximations to the Chow parameters of a linear threshold function outputs a bounded function whose 
Chow parameters are close to those of /. 
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1 Introduction 



1.1 Background and motivation. A linear threshold function, or LTF, over {—1, 1}™ is a Boolean func- 
tion / : {-1,1}™ -> {-1,1} of the form 

f(x) = sign fj^ w i x i ~ o) > 

where w\, . . . ,w n ,8 G R. The function sign(z) takes value 1 if 2 > and takes value — 1 if z < 0; the 
Wj's are the weights of / and is the threshold. Linear threshold functions have been intensively studied 
for decades in many different fields. They are variously known as "halfspaces" or "linear separators" in 
machine learning and computational learning theory, "Boolean threshold functions," "(weighted) threshold 
gates" and "(Boolean) perceptrons (of order 1)" in computational complexity, and as "weighted majority 
games" in voting theory and the theory of social choice. Throughout this paper we shall refer to them 
simply as LTFs. 

The Chow parameters of a function / : {— 1, l} n — > R are the n + 1 values 

/(0) = E[/(x)], f(i) = E[f(x)xi] for i = 1, . . . ,n, 

i.e. the n + 1 degree-0 and degree- 1 Fourier coefficients of /. (Here and throughout the paper, all probabili- 
ties and expectations are with respect to the uniform distribution over {—1, 1}™ unless otherwise indicated.) 
It is easy to see that in general the Chow parameters of a Boolean function may provide very little informa- 
tion about /; for example, any parity function on at least two variables has all its Chow parameters equal 
to 0. However, in a surprising result, C.-K. Chow HCho611 showed that the Chow parameters of an LTF / 
uniquely specify / within the space of all Boolean functions mapping {—1, l} n — > {—1, 1}- Chow's proof 
(given in Section [3TTb is simple and elegant, but is completely non-constructive; it does not give any clues 
as to how one might use the Chow parameters to find / (or an LTF that is close to /). This naturally gives 
rise to the following algorithmic question, which we refer to as the "Chow Parameters Problem:" 

The Chow Parameters Problem (rough statement): Given (exact or approximate) values for 
the Chow parameters of an unknown LTF /, output an (exact or approximate) representation of 

/ as sign(t>i2:i H h v n x n - 6'). 



Motivation and Prior Work. We briefly survey some previous research on the Chow Parameters problem 
(see Section 1.1 of BOS 111 for a more detailed and extensive account). Motivated by applications in electrical 
engineering, the Chow Parameters Problem was intensively studied in the 1960s and early 1970s; several 
researchers suggested heuristics of various sorts IKas63l IWin63[|KW65llDer65ll which were experimentally 
analyzed in MWin691 . See MWin71ll for a survey covering much of this early work and HBau731 lHur731 for 
some later work from this period. 

Researchers in game theory and voting theoiy rediscovered Chow's theorem in the 1970s |Lap72|, and 
the theorem and related results have been the subject of study in those communities down to the present 
IDS791 IEL891 ITZ921 lFre97l ILee03l ICar04l IFM04I ITT061 IAPL07L Since the Fourier coefficient f(i) can 
be viewed as representing the "influence" of the i-th voter under voting scheme / (under the "Impartial 
Culture Assumption" in the theory of social choice, corresponding to the uniform distribution over inputs 
x G { — 1, l} n ), the Chow Parameters Problem corresponds to designing a set of weights for n voters so that 
each individual voter has a certain desired level of influence over the final outcome. 

In the 1990s and 2000s several researchers in learning theory considered the Chow Parameters Prob- 
lem. Birkendorf et al. ||BDJ + 98l showed that the Chow Parameters Problem is equivalent to the problem of 
efficiently learning LTFs under the uniform distribution in the "1-Restricted Focus of Attention (1-RFA)" 
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model of Ben-David and Dichterman MBDD98I (we give more details on this learning model in Section |8). 
Birkendorf et al. showed that if / is an LTF with integer weights of magnitude at most poly(n), then esti- 
mates of the Chow parameters that are accurate to within an additive ±e/poly(n) information- theoretic ally 
suffice to specify the halfspace / to within e-accuracy. Other information-theoretic results of this flavor 
were given by HGol06IISer071 . In complexity theory several generalizations of Chow's Theorem were given 
in MBru901 IRSOK95I , and the Chow parameters play an important role in a recent study HCHIS101 of the 
approximation-resistance of linear threshold predicates in the area of hardness of approximation. 

Despite this considerable interest in the Chow Parameters Problem from a range of different communi- 
ties, the first provably effective and efficient algorithm for the Chow Parameters Problem was only obtained 

fairly recently. HQS 111 gave a poly(n) • 2 2 ° tl/E '-time algorithm which, given sufficiently accurate estimates 
of the Chow parameters of an unknown n-variable LTF /, outputs an LTF /' that has Pr[/(x) / f'(%)] < £■ 

1.2 Our results. In this paper we give a significantly improved algorithm for the Chow Parameters Prob- 
lem, whose running time dependence on e is almost doubly exponentially better than the HQS 111 algorithm. 
Our main result is the following: 

Theorem 1 (Main, informal statement). There is an 6{n 2 ) ■ (l/e) ^ 2 ^)) . \ g(l/5)-time algorithm A 
with the following property: Let f : { — 1, l} n — > { — 1,1} be an LTF and let < e, 5 < 1/2. If A is 
given as input e, 5 and (sufficiently precise estimates of) the Chow parameters of f, then A outputs integers 
Vi, . . . , v n , 9 such that with probability at least 1 — 5, the linear threshold function f* = sign(uia;i + • • • + 
v n x n ~ 9) satisfies Pr x [f(x) / f*(x)] < e. 

Thus we obtain an efficient randomized polynomial approximation scheme (ERPAS) with a quasi- 
polynomial dependence on l/e. We note that for the subclass of LTFs with integer weights of magnitude at 
most poly(n), our algorithm runs in poly(n/e) time, i.e. it is a. fully polynomial randomized approximation 
scheme (FPRAS) (see Section 17.11 for a formal statement). Even for this restricted subclass of LTFs, the 
algorithm of HQS 111 runs in time doubly exponential in l/e. 

Our main result has a range of interesting implications in learning theory. First, it directly gives an 
efficient algorithm for learning LTFs in the uniform distribution 1-RFA model. Second, it yields a very 
fast agnostic-type algorithm for learning LTFs in the standard uniform distribution PAC model. Both these 
algorithms run in time quasi -polynomial in l/e. We elaborate on these learning applications in Section [8] 

An interesting feature of our algorithm is that it outputs an LTF with integer weights of magnitude at 
most y/n ■ (l/e) 0( - log (V 6 )). Hence, as a corollary of our approach, we obtain essentially optimal bounds 
on approximating arbitrary LTFs using LTFs with small integer weights. It has been known since the 1960s 
that every n-variable LTF / has an exact representation sign(?i> • x — 9) in which all the weights Wi are 
integers satisfying \wi\ < 2°( nlo s n ), and Hastad ||Has94| has shown that there is an n- variable LTF / for 
which any integer- weight representation must have each \wi\ > 2 n ( nlogn \ However, by settling for an 
approximate representation (i.e. a representation /' = sign(u; ■ x — 9) such that T?r x [f(x) ^ f'{x)] < e), it 
is possible to get away with much smaller integer weights. Servedio MSer071 showed that every LTF / can be 
e-approximated using integer weights each at most yfn ■2°^ 1 l e \ and this bound was subsequently improved 
(as a function of e) to n 3 / 2 • 2°^ 1 ^ 2 3 ) in IIDS091 . (We note that ideas and tools that were developed in work 
on low-weight approximators for LTFs have proved useful in a range of other contexts, including hardness of 
approximation HFGRW091 . property testing HMORS101 . and explicit constructions of pseudorandom objects 
llDGJ+101 .) 

Formally, our approach to proving Theorem Q] yields the following nearly-optimal weight bound on 
e-approximators for LTFs: 

Theorem 2 (Low-weight approximators for LTFs). Let f : {— 1, l} n — > {—1, 1} be any LTF. There is an 
LTF f* = sign(fixi + • • • + v n x n — 9) such that Pr x [/(x) / f*(x)] < e and the weights Vi are integers 
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that satisfy 

f>2 = n .(i/ e )°(i°g 2 (iA)). 

i=l 

The bound on the magnitude of the weights in the above theorem is optimal as a function of n and nearly 
optimal as a function of e. Indeed, as shown in ||Has94llGol06l . in general any e-approximating LTF /* for 
an arbitrary n-variable LTF / may need to have integer weights at least max{0(y / n), (l/e)^ 1 ^ 10 ^ 1 / 6 ))}. 
Thus, Theorem |2] nearly closes what was previously an almost exponential gap between the known upper 
and lower bounds for this problem. Moreover, the proof of Theorem [2] is constructive (as opposed e.g. to 
the one in HDS0910 . i.e. there is a randomized poly(n) • (l/e)°( log ^ 1//e ^-time algorithm that constructs an 
e-approximating LTF. 

Techniques. We stress that not only are the quantitative results of Theorems Q] and [2] dramatically stronger 
than previous work, but the proofs are significantly more self-contained and elementary as well. The HQS 111 
algorithm relied heavily on several rather sophisticated results on spectral properties of linear threshold 
functions; moreover, its proof of correctness required a careful re-tracing of the (rather involved) analysis of 
a fairly complex property testing algorithm for linear threshold functions given in MMORSIOI . In contrast, 
our proof of Theorem Q]entirely bypasses these spectral results and does not rely on [M ORSlOl in any way. 
Turning to low-weight approximators, the improvement from 2 ( -'( 1 / e2 ) in MSer071 to 2^( 1 / e2/3 ) in HDS09I 
required a combination of rather delicate linear programming arguments and powerful results on the anti- 
concentration of sums of independent random variables due to Halasz IIHal771 . In contrast, our proof of 
Theorem |2]bypasses anti-concentration entirely and does not require any sophisticated linear programming 
arguments. 

Two main ingredients underlie the proof of Theorem [T] The first is a new structural result relating 
the "Chow distance" and the ordinary (Hamming) distance between two functions / and g, where / is 
an LTF and g is an arbitrary bounded function. The second is a new and simple algorithm which, given 
(approximations to) the Chow parameters of an arbitrary Boolean function /, efficiently construct a "linear 
bounded function" (LBF) g - a certain type of bounded function - whose "Chow distance" from / is small. 
We describe each of these contributions in more detail below. 

1.3 The main structural result. In this subsection we first give the necessary definitions regarding Chow 
parameters and Chow distance, and then state Theorem [7] our main structural result. 

1.3.1 Chow parameters and distance measures. We formally define the Chow parameters of a function 

on {-l,l} n : 

Definition 3. Given any function f : {— 1, l} n — > M, its Chow Parameters are the rational numbers 
/(0),/(l),... ,/(n) defined by /(0) = E[/(x)], f(i) = B[f(x)xi]for 1 < i < n. We say that the 
Chow vector off is x f = (/(0), /(l), . . . , f(n)). 

The Chow parameters naturally induce a distance measure between functions /, g: 

clef 

Definition 4. Let f,g:{ — l, 1}" — > M. We define the Chow distance between f and g to be dchowif, d) = 
\\\f — x g \\2, i-e- the Euclidean distance between the Chow vectors. 

This is in contrast with the familiar L\ -distance between functions: 

def 

Definition5. The distance between two functions f,g : { — 1,1}™ — > M. is defined as dist(f , g) = E[|/(x) — 
If dist(f, g) < e, we say that f and g are e-close. 

We note that if /, g are Boolean functions with range {—1, 1} then dist(/, g) = 2 Pr[/(i) ^ g(%)] an d 
thus dist is equivalent (up to a factor of 2) to the familiar Hamming distance. 
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1.3.2 The main structural result: small Chow-distance implies small distance. The following fact can 
be proved easily using basic Fourier analysis (see Proposition 1.5 in HQS 1 IB ") : 

Fact 6. Let f,g:{-l, l} n -> R. We have that d Ch ow(/, g) < 2 A /dist(/, g). 

Our main structural result, Theorem [7] is essentially a converse which bounds dist (/,<?) in terms of 
rfchow(/ 5 g) when / is an LTF and g is any bounded function: 

Theorem 7 (Main Structural Result). Let f : {-1, 1}™ ->■ {-1, 1} Z><? an L7F and g : {-1, 1}™ -> [-1, 1] 
&e a?ry bounded function. If dchowif, g) < e 

di S t(/, 9 )<2-K^ T A)). 

Since Chow's theorem says that if / is an LTF and g is any bounded function then dchowif, g) = 
implies that dist(/, g) = 0, Theorem [7J may be viewed as a "robust" version of Chow's Theorem. Note 
that the assumption that g is bounded is necessary for the above statement, since the function g(x) = 
Sr=o f(^) x i ( wnere xq = 1) has dchow(f,g) = 0, but may have dist(/, g) = Results of this sort 

but with weaker quantitative bounds were given earlier in llBDJ+98l iGoIOSl ISerOTl IQSTT1 ; we discuss the 
relationship between Theorem [TJand some of this prior work below. 

Discussion. Theorem [7] should be contrasted with Theorem 1.6 of HQS 111 , the main structural result of that 
paper. That theorem says that for / : {—1, l} n — > { — 1, 1} any LTF and g : { — 1, l} n — > [—1, 1] any 
bounded functional if dchowif^g) — 6 trien dist(/, g) < (5(l/-^/log(l/e)). Our new Theorem |7]provides a 
bound on dist(/, g) which is almost exponentially stronger than the HQS 111 bound. 

Theorem |7]should also be contrasted with Theorem 4 (the main result) of MG0IO6II . which says that for / 
an n-variable LTF and g any Boolean function, if d C how(/, g) < (e/n)°^ n /^ MV«0) then dist(/, g) < e. 
Phrased in this way, Theorem |7]says that for / an LTF and g any bounded function, if dchow(/>s) ^ 
e O(iog (i/e)) t ^ en dist(/, g) < e. So our main structural result may be viewed as an improvement of Gold- 
berg's result that removes its dependence on n. Indeed, this is not a coincidence; Theorem |7] is proved by 
carefully extending and strengthening Goldberg's arguments using the "critical index" machinery developed 
in recent studies of structural properties of LTFs !1Ser07 J 15 S 1 1 1 |DGJ + 1 Ot . 

It is natural to wonder whether the conclusion of Theorem [7]can be strengthened to "dist(/, g) < e c " 
where c > is some absolute constant. We show that no such strengthening is possible, and in fact, no 
conclusion of the form "dist(/, g) < 2~ 7 ^^" is possible for any function 7(e) = w(log(l/e)/ log log(l/e)); 
we prove this in Section I7T21 

1.4 The algorithmic component. A straightforward inspection of the arguments in HQS 111 shows that by 
using our new Theorem |7]in place of Theorem 1.6 of that paper throughout, the running time of the HQS 111 

algorithm can be improved to poly(n) • 2^ 1//<E ^° <los (1/e)) . This is already a significant improvement over the 

poly(n) • 2 2 ° (1/£ ) running time of HOSlll . but is significantly worse than the poly(n) • (l/e)°( 1 °s 2 ( 1 A)) 
running time which is our ultimate goal. 

The second key ingredient of our results is a new algorithm for constructing an LTF from the (approx- 
imate) Chow parameters of an LTF /. The previous approach to this problem HQS 111 constructed an LTF 
with Chow parameters close to Xf directly and applied the structural result to the constructed LTF. Instead, 
our approach is based on the insight that it is substantially easier to find a bounded real-valued function g 
that is close to / in Chow distance. The structural result can then be applied to g to conclude that g is close 
to / in Li-distance. The problem with this idea is, of course, that we need an LTF that is close to / and 

The theorem statement in IPS 111 actually requires that g have range { — 1, 1}, but the proof is easily seen to extend to g : 
{-1,1}™ [-1,1] as well. 
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not a general bounded function. However, we show that it is possible to find g which is a "linear bounded 
function" (LBF), a type of bounded function closely related to LTFs. An LBF can then be easily converted 
to an LTF with only a small increase in distance from /. We now proceed to define the notion of an LBF 
and state our main algorithmic result formally We first need to define the notion of a projection: 

Definition 8. For a real value a, we denote its projection to [—1, 1] by P\{a). That is, Pi(a) = a if \a\ < 1 
and P\ (a) = sign(a), otherwise. 

Definition 9. A function g : { — 1, l} n — > [—1, 1] is referred to as a linear bounded function (LBF) if there 
exists a vector of real values w = (wo,wi, . . . , w n ) such that g(x) = P±(wo + Y^=i w i x i)- The vector w 
is said to represent g. 

We are now ready to state our main algorithmic result: 

Theorem 10 (Main Algorithmic Result). There exists a randomized algorithm ChowReconstruct that 
for every Boolean function f : {— 1, l} n — > {—1, 1}, given e > 0, 5 > and a vector a = (ao, oti, . . . , a n ) 
such that \\xf — a\\ < e, with probability at least 1 — 5, outputs an LBF g such that \\xf — Xg\\ ^ 6e. The 
algorithm runs in time 0(n 2 e~ 4 log (1/5)). Further, g is represented by a weight vector kv G R n+1 , where 
k£| and v is an integer vector of length \\v\\ = 0(^/n/e s ). 

We remark that the condition on the weight vector v given by Theorem [10] is the key for the proof of 
Theorem |2] 

Note that the running time of ChowReconstruct is polynomial in the relation between Chow dis- 
tance and Li-distance. By the structural result of ||BDJ + 98| . this implies that for the subclass of LTFs with 
integer weights of magnitude bounded by poly(ra), we obtain a poly(n/e) time algorithm, i.e. an FPRAS. 

Discussion. It is interesting to note that the approach underlying Theorem [10] is much more efficient and 
significantly simpler than the algorithmic approach of HQS 1 111 . The algorithm in HQS 111 roughly works as 
follows: In the first step, it constructs a "small" set of candidate LTFs such that at least one of them is close 
to /, and in the second step it identifies such an LTF by searching over all such candidates. The first step 
proceeds by enumerating over "all" possible weights assigned to the "high influence" variables. This brute 
force search makes the HQS 111 algorithm very inefficient. Moreover, its proof of correctness requires some 
sophisticated spectral results from HMORS101 . which make the approach rather complicated. 

In this work, our algorithm is based on a boosting-based approach, which is novel in this context. Our 
approach is much more efficient than the brute force search of MPS 111 and its analysis is much simpler, 
since it completely bypasses the spectral results of AMORS 101 . We also note that the algorithm of HQS 111 
crucially depends on the fact that the relation between Chow distance and distance has no dependence on n. 
(If this was not the case, the approach would not lead to a polynomial time algorithm.) Our boosting-based 
approach is quite robust, as it has no such limitation. This fact is crucial for us to obtain the aforementioned 
FPRAS for small-weight LTFs. 

While we are not aware of any prior results similar to Theorem [10] being stated explicitly, we note that 
weaker forms of our theorem can be obtained from known results. In particular, Trevisan et al. HTTV091 
describe an algorithm that given oracle access to a Boolean function /, e' > 0, and a set of functions H = 
{hi, /12, ... /ifc}> efficiently finds a bounded function g that for every i < n satisfies | E[/-/ij] — E[g-ft,j]| < e'. 
One can observe that if H = {1, x±, . . . , x n }, then the function g returned by their algorithm is in fact an 
LBF and that the oracle access to / can be replaced with approximate values of E[/ • hi] for every i. Hence, 
the algorithm in HTTV091 . applied to the set of functions H = {1, x\, X2, ■ ■ ■ , x n }, would find an LBF g 
which is close in Chow distance to /. A limitation of this algorithm is that, in order to obtain an LBF which 
is A-close in Chow distance to /, it requires that every Chow parameter of / be given to it with accuracy 
of 0(A/^/n). In contrast, our algorithm only requires that the total distance of the given vector to Xf is at 
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most A/6. In addition, the bound on the integer weight approximation of LTFs that can be obtained from 
the algorithm in HTTV091 is linear in n 3 / 2 , whereas we obtain the optimal dependence of ^fn. 

The algorithm in HTTV091 is a simple adaptation of the hardcore set construction technique of Impagli- 
azzo [Imp95 1. Our algorithm is also based on the ideas from [ Imp95[ and, in addition, uses ideas from the 
distribution-specific boosting technique in HFellOi 

Our algorithm can be seen as an instance of a more general approach to learning (or approximating) 
a function that is based on constructing a bounded function with the given Fourier coefficients. Another 
instance of this new approach is the recent algorithm for learning a certain class of polynomial threshold 
functions (which includes polynomial-size DNF formulae) from low-degree Fourier coefficients HFell21 . 
We note that the algorithm in HFell21 is based on an algorithm similar to ours. However, like the algorithm 
in MTTV09II . it requires that every low-degree Fourier coefficient be given to it with high accuracy. As a 
result it would be similarly less efficient in our application. 

Organization. In Section [2] we record some mathematical preliminaries that will be used throughout the 
paper. In Section [3] we present some observations regarding the complexity of solving the Chow parameters 
problem exactly and give an LP-based 2°( n )-time algorithm for it. Sections @]and[5]contain the proof of our 
main structural result (Theorem [7). In Section |6]we present our main algorithmic ingredient (Theorem ITOV 
Section|7]puts the pieces together and proves our main theorem (Theoremd} and our other main result (The- 
orem [2), while Section [8] presents the consequences of our results to learning theory. Finally, in Section [9] 
we conclude the paper and present a few interesting research directions. 



2 Mathematical Preliminaries 

2.1 Probabilistic Facts. We require some basic probability results including the standard additive Ho- 
effding bound: 

Theorem 11. Let X%, . . . , X n be independent random variables such that for each j G [n], Xj is supported 
on [dj,bj]for some a,j, bj G R, aj < bj. Let X = Ylj=i -^j- Then, for any t > 0, Pr [\X — E[X]\ > t\ < 

2exp (-2t 2 /E-=A--ai) 2 )- 

The Berry-Esseen theorem (see e.g. HFel681 ) gives explicit error bounds for the Central Limit Theorem: 

Theorem 12. (Berry-Esseen) Let X\, . . . , X n be independent random variables satisfying E[Xj] = Ofor 
all i G [n], E[X 2 ] = a, and J2i^[\ x i\ 3 } = P3- Let S = (Xi + • • • + X n ) jo and let F denote the 

cumulative distribution function (cdf) of S. Then sup x \F(x) — <£(x)| < P3/C 3 where $ denotes the cdf of 
the standard gaussian random variable. 

An easy consequence of the Berry-Esseen theorem is the following fact, which says that a regular linear 
form has good anti-concentration (i.e. it assigns small probability mass to any small interval): 

Fact 13. Let w = (w±, . . . ,w n ) be a r-regular vector in W 1 and write a to denote \\w\\2- Then for any 

interval [a, b] C E, we have \ PrE" =1 wixi <E (a, b]] - $([a/a, b/a})\ < 2r, where $([c, d]) = $(d) - 
<3?(c). In particular, it follows that 

n 

Pr [ £ w iXi G (a, b}] <\b-a\/a + 2r. 
i=l 
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2.2 Useful inequalities. We will need the following elementary inequalities. 

Fact 14. For a, be (0,1), ( fl 6)iog(i/a)+iog(i/6) > a 2tog(l/a) . & 21og(i/6)_ 

Proof. 

/ a& yog(l/a)+log(l/6) = 2 -log 2 (l/a)-log 2 (l/b)-21og(l/a)-log(l/fe) 
> 2 -21og 2 (l/a)-21og 2 (l/b) 
= a 21og(l/a) . & 21og(l/6) } 

where the inequality is the arithmetic-geometric mean inequality. □ 
Similarly, we obtain: 

Fact 15. Forx,y> 1, (x + y)~ !<*(*+») > (2x)" lo §( 2a; ) • (2y)- lo s( 2 f). 

2.3 Useful facts about afflne spaces. A subset V C R" is said to be an <^«e subspace if it is closed 
under affine combinations of vectors in V . Equivalently, V is an affine subspace of W 1 if V = X + b where 
b G R n and X is a linear subspace of W l . The affine dimension of V is the same as the dimension of the 
linear subspace X. A hyperplane in W n is an affine space of dimension n — 1. Throughout the paper we use 
bold capital letters such as H to denote hyperplanes. 

In this paper whenever we refer to a "subspace" we mean an affine subspace unless explicitly otherwise 
indicated. The dimension of an affine subspace V is denoted by dim(V). Similarly, for a set S C IR n , we 
write span(5) to denote the affine span of S, i.e. 

m 

span(S) = {s + ^2 w i( xl ~ V % ) I s > x ^y l £ S,Wi £R,m £ N}. 
i=i 

The following veiy useful fact about affine spaces was proved by Odlyzko llOdl88ll . 

Fact 16. [Odl88] Any affine subspace ofW n of dimension d contains at most 2 d elements of{— 1, 1}". 

3 On the Exact Chow Parameters Problem 

In this section we make some observations regarding the complexity of the exact version of the Chow 
parameters problem and present a simple (albeit exponential time) algorithm for it, that beats brute-force 
search. 

3.1 Proof of Chow's Theorem. For completeness we state and prove Chow's theorem here: 

Theorem 17 r HCho611 1. Let f : {-1, l} n -> {-1, 1} be an LTF and let g : {-1, l} n -> [-1, 1] be a 
bounded function such that g(j) = f(j)for all < j < n. Then g = f. 

Proof. Write f(x) = sign(u;o + w\X\ + • • • + w n x n ), where the weights are scaled so that Y^=o w ] = 1- 
We may assume without loss of generality that \wq + w±xi + • • • + w n x n \ ^ for all x. (If this is not 
the case, first translate the separating hyperplane by slightly perturbing wq to make it hold; this can be done 
without changing /'s value on any point of{ — l,l} n .) Now we have 

n 

o = 5>i(/b')-?(j)) 

3=0 

= E[(u>o + wixi H h w n x n )(f(x) - g{x))] 

= E[\f(x) — g(x)\ ■ \wq + w\X\ H h^ n ^n|]- 
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The first equality is by the assumption that f(j) = g(j) for all < j < n, the second equality is linearity 
of expectation (or Plancherel's identity), and the third equality uses the fact that 

sign(/(x) - g(x)) = f(x) = sign(w + wiXi H h w n x n ) 

for any bounded function g with range [—1, 1]. But since \wq + w\X\ + • • • + w n x n \ is always strictly 
positive, we must have Pr[/(x) / g{x)] = as claimed. □ 

3.2 An exact 2°( n )-time algorithm. Let us start by pointing out that it is unlikely that the Chow Pa- 
rameters problem can be solved exactly in polynomial time. Note that even checking the correctness of a 
candidate solution is jjP-complete, because computing /(0) is equivalent to counting 0-1 knapsack solu- 
tions. This suggests (but does not logically imply) that the exact problem is intractable; characterizing its 
complexity is an interesting open problem (see Section 

The naive brute-force approach (enumerate all possible n- variable LTFs, and for each one check whether 
it has the desired Chow parameters) requires 2 ( n ' time. The following proposition gives an improved 
(albeit exponential time) algorithm: 

Proposition 18. The Chow parameters problem can be solved exactly in time 2°^ n \ 

Proof. Let on, i = 0, 1, . . . , n be the target Chow parameters; we are given the promise that there exists 
an LTF / : {—1, l} n — > { — 1, 1} such that f(i) = cti for all i. Our goal is to output (a weights-based 
representation of) the function /. Let g : { — 1, 1}™ — > [—1, 1] be a bounded function that has the same 
Chow parameters as /. We claim that there exists a linear program with 2 n variables and 0(2 n ) constraints 
encoding the truth-table of g. Indeed, for every x G { — 1, l} n we have a variable g(x) and the constraints 
are as follows: For all x E {—1, l} n we include the constraint — 1 < g{x) < 1. We also include the (n + 1) 
constraints E x [g(x)xi] = 2~ n J2 x e{-i i} n d( x ) x i = ai, i = 0,1, ... ,n (where xq = 1). Chow's theorem 
stated above implies that the aforementioned linear program has a unique feasible solution, corresponding 
to the truth table of the target LTF /. That is, the unique solution of the linear program will be integral and 
is identical to the target function. Since the size of the linear program is 2°(") and linear programming is m 
P, the truth table of / can thus be computed in time 2°^ n \ 

A weight-based representation of / as sign(u;-x— 6) can then be obtained straightforwardly in time 2°^ 
by solving another linear program with variables (w, 9) and 2 n constraints, one for each x G { — 1, l} n . □ 



4 Proof overview of main structural result: Theorem 7 

In this section we provide a detailed overview of the proof of Theorem |7] restated here for convenience: 

Theorem H (Main Structural Result). Let f : {-1, l} n -> {-1, 1} be an LTF and g : {-1, l} n [-1, 1] 

be any bounded function. If dchow(/> <?) ^ e then d\st{f,g) < 2 

We give an informal overview of the main ideas of the proof of Theorem [7] in Section 14.11 and then 
proceed with a detailed outline of Theorem |7]in Section l4~2l 

4.1 Informal overview of the proof. We first note that throughout the informal explanation given in this 
subsection, for the sake of clarity we restrict our attention to the case in which g : {—1,1}™ — > { — 1,1} is a 
Boolean rather than a bounded function. In the actual proof we deal with bounded functions using a suitable 
weighting scheme for points of {—1, l} n (see the discussion before Fact|28]near the start of the proof of 
Theorem [7]). 

To better explain our approach, we begin with a few words about how Theorem 1.6 of BOS111 (the only 
previously known statement of this type that is "independent of n") is proved. The key to that theorem is 
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a result on approximating LTFs using LTFs with "good anti-concentration"; more precisely, HQS 111 shows 
that for any LTF / there is an LTF f'(x) = sign(t> ■ x — v), \\v\\ = 1, that is extremely close to / (Hamming 
distance roughly 2~ 1 / e ) and which has "moderately good anticoncentration at radius e," in the sense that 
PrNu • x — v | < e] < 0(l/Vlog(l/e)). Given this, Theorem 1.6 of rfOSTTI is proved using a modification 
of the proof of the original Chow's Theorem. However, for this approach based on the original Chow proof 
to work, it is crucial that the Hamming distance between / and /' (namely 2~ 1//e ) be very small compared 
to the anti-concentration radius (which is e). Subject to this constraint it seems very difficult to give a 
significant quantitative improvement of the approximation result in a way that would improve the bound of 
Theorem 1.6 of llOSTTI . 

Instead, we hew more closely to the approach used to prove Theorem 4 of HGol06ll . This approach also 
involves a perturbation of the LTF /, but instead of measuring closeness in terms of Hamming distance, 
a more direct geometric view is taken. In the rest of this subsection we give a high-level explanation of 
Goldberg's proof and of how we modify it to obtain our improved bound. 

The key to Goldberg's approach is a (perhaps surprising) statement about the geometry of hyperplanes 
as they relate to the Boolean hypercube. He establishes the following key geometric result (see Theorem |2T1 
for a precise statement): 

If H is any n-dimensional hyperplane such that an a fraction of points in { — 1, l} n lie "very 
close" in Euclidean distance (essentially l/quasipoly(n/a)) to H, then there is a hyperplane 
H' which actually contains all those a2 n points of the hypercube. 

With this geometric statement in hand, an iterative argument is used to show that if the Hamming distance 
between LTF / and Boolean function g is large, then the Euclidean distance between the centers of mass 
of (the positive examples for / on which / and g differ) and (the negative examples for / on which / and 
g differ) must be large; finally, this Euclidean distance between centers of mass corresponds closely to the 
Chow distance between / and g. 

However, the l/quasipoly(n) closeness requirement in the key geometric statement means that Gold- 
berg's Theorem 4 not only depends on n, but this dependence is superpolynomial. The heart of our improve- 
ment is to combine Goldberg's key geometric statement with ideas based on the "critical index" of LTFs to 
get a version of the statement which is completely independent of n. Roughly speaking, our analogue of 
Goldberg's key geometric statement is the following (a precise version is given as Lemmal22l below): 

If H is any n-dimensional hyperplane such that an a fraction of points in {—1, 1}™ lie within 
Euclidean distance a°( 1 °s( 1 / a )) Q f h, then there is a hyperplane H' which contains all but a 
tiny fraction of those a2 n points of the hypercube. 

Our statement is much stronger than Goldberg's in that there is no dependence on n in the distance 
bound from H, but weaker in that we do not guarantee H' passes through every point; it may miss a tiny 
fraction of points, but we are able to handle this in the subsequent analysis. Armed with this improvement, 
a careful sharpening of Goldberg's iterative argument (to get rid of another dependence on n, unrelated to 
the tiny fraction of points missed by H') lets us prove Theorem [7] 

4.2 Detailed outline of the proof. As discussed in Section |4~T1 the key to proving Theorem [7] is an im- 
provement of Theorem 3 in HGol061 . 

Definition 19. Given a hyperplane H in M. n and > 0, the /3-neighborhood of H is defined as the set of 
points in W 1 at Euclidean distance at most ft from H. 

We recall the following fact which shows how to express the Euclidean distance of a point from a 
hyperplane using the standard representation of the hyperplane: 
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Fact 20. Let H = {x : w ■ x — 9 = 0} be a hyperplane in W 1 where \\w\\ = 1. Then for any x £ R n , the 
Euclidean distance d(x, H) of x from H is \w ■ x — 9\. 

Theorem 21 (Theorem 3 in HG0IO6II ). Given any hyperplane in W l whose (3 -neighborhood contains a subset 
S of vertices of {— 1, l} n , where \S\ = a ■ 2 n , there exists a hyperplane which contains all elements of S 
provided that 

< P < ((2/q) • ra 5+Llog(n/a)J . ( 2 + Llog( n /a)J)!) _1 . 

Before stating our improved version of the above theorem, we define the set U = U™ =1 ej U where 
G W 1 is the all zeros vector and ej E W 1 is the unit vector in the i th direction. 
Our improved version of Theorem |2T]is the following: 

Lemma 22. Let H be a hyperplane in W 1 whose ^-neighborhood contains a subset S of vertices of 
{ — 1, l} n , where \S\ = a ■ 2 n . Fix < k < a/2. Then there exists a hyperplane H' in W 1 that con- 
tains a subset S* C S of cardinality at least (a — k) ■ 2 n provided that < f3 < where 

A) ^(\og(l/K))- 1 / 2 ■ (loglogll/^))- ^ 1 ^ 10 ^ 1 ^) • a°( 1 °s( 1 /«)). 
Moreover, the coefficient vector defining H' has at most 

O ((1/a 2 ) • (loglog(l/«) + log 2 (l/a))) 
nonzero coordinates. Further, for any x£U,ifx lies on H then x lies on H' as well. 

Discussion. We note that while Lemma|22]may appear to be incomparable to Theorem |2T]because it "loses" 
K,2 n points from the set S, in fact by taking k = l/2 n+1 it must be the case that our S* is the same as S, and 
with this choice of k, Lemma [22] gives a strict quantitative improvement of Theorem [21] (We stress that for 
our application, though, it will be crucial for us to use Lemma l22lby setting the k parameter to depend only 
on a independent of n.) We further note that in any statement like Lemma l22lthat does not "lose" any points 
from S, the bound on (3 must necessarily depend on n; we show this in Appendix [A] Finally, the condition 
at the end of Lemma [22] (that if x G U lies on H, then it lies on H' as well) is something we will require 
later for technical reasons. 

We give the detailed proof of Lemma[22]in Section l5T2l We now briefly sketch the main idea underlying 
the proof of the lemma. At a high level, the proof proceeds by reducing the number of variables from n 
down to 

m = ((1/a 2 ) • (log(l//3) + loglog(l/K))) 

followed by an application of Theorem [45] a technical generalization of Theorem [2T]proved in Appendix IB1 
in W 71 . (As we will see later, we use Theorem |45] instead of Theorem |2T1 because we need to ensure that 
points of U which lie on H continue to lie on H'.) The reduction uses the notion of the T-critical index 
applied to the vector w defining H. (See Section [5TT1 for the relevant definitions.) 

The idea of the proof is that for coordinates i in the "tail" of w (intuitively, where \wi \ is small) the value 
of Xi does not have much effect on d(x, H), and consequently the condition of the lemma must hold true 
in a space of much lower dimension than n. To show that tail coordinates of x do not have much effect on 
d(x, H), we do a case analysis based on the r-critical index c(w, t) of w to show that (in both cases) the 
2-norm of the entire "tail" of w must be small. If c(w, r) is large, then this fact follows easily by properties 
of the r-critical index. On the other hand, if c(w, r) is small we argue by contradiction as follows: By the 
definition of the T-critical index and the Berry-Esseen theorem, the "tail" of w (approximately) behaves like 
a normal random variable with standard deviation equal to its 2-norm. Hence, if the 2-norm was large, the 
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entire linear form w ■ x would have good anti-concentration, which would contradict the assumption of the 
lemma. Thus in both cases, we can essentially ignore the tail and make the effective number of variables be 
m which is independent of n. 

As described earlier, we view the geometric Lemma |22] as the key to the proof of Theorem [TJ however, 
to obtain Theorem [7] from Lemma 1221 requires a delicate iterative argument, which we give in full in the 
following section. This argument is essentially a refined version of Theorem 4 of HG0IO6I with two main 
modifications: one is that we generalize the argument to allow g to be a bounded function rather than a 
Boolean function, and the other is that we get rid of various factors of yfn which arise in the HG0IO6I 
argument (and which would be prohibitively "expensive" for us). We give the detailed proof in Section 1531 

5 Proof of Theorem |7] 

In this section we provide a detailed proof of our main structural result (Theorem |7). 

5.1 Useful Technical Tools. As described above, a key ingredient in the proof of Theorem [7]is the notion 
of the "critical index" of an LTF /. The critical index was implicitly introduced and used in HSer071 and 
was explicitly used in HDS091 lDGJ + 10l IPS 111 and other works. To define the critical index we need to first 
define "regularity": 

Definition 23 (regularity). Fix r > 0. We say that a vector w = (w\, . . . , w n ) G W 1 is T-regular if 

max ie [ n ] \ti)i\ < t\\w\\ = t^Jw\ + • • • + w\. A linear form w ■ x is said to be T-regular if w is T-regular, 
and similarly an LTF is said to be T-regular if it is of the form s\gn(w ■ x — 6) where w is T-regular. 

Regularity is a helpful notion because if w is T-regular then the Berry-Esseen theorem (stated below) 
tells us that for uniform x G {—1,1}", the linear form w ■ x is "distributed like a Gaussian up to error 
r." This can be useful for many reasons; in particular, it will let us exploit the strong anti-concentration 
properties of the Gaussian distribution. 

Intuitively, the critical index of w is the first index i such that from that point on, the vector (wi , Wi + \, . . . , w n ) 
is regular. A precise definition follows: 

Definition 24 (critical index). Given a vector w G W 1 such that \w\\ > ■ ■ ■ > \w n \ > 0, for k G [n] we 
denote by the quantity yjY^l=k w f- ^ e define the r-critical index c(w,t) of w as the smallest index 
i G [n]for which \wi\ < r • crj. If this inequality does not hold for any i G [n], we define c(w, r) = 00. 

The following simple fact states that the "tail weight" of the vector w decreases exponentially prior to 
the critical index: 

Fact 25. For any vector w = (w\, . . . , w n ) such that \w\\ > ■ ■ ■ > \w n \ > and 1 < a < c(w, r), we have 

< 7 a <(l-T 2 )( a - 1 )/ 2 .(7 1 . 

Proof. If a < c(w, r), then by definition |to | > r • a a . This implies that o" a +i < \f\ — r 2 • a a . Applying 
this inequality repeatedly, we get that u a < (1 — t 2 )^" 1 ^ 2 • o\ for any 1 < a < c(w, r). □ 

5.2 Proof of Lemma l22l Let < r < a. Let H = {x G W 1 \ w ■ x = 9} where we can assume (by 
rescaling) that \\w\\2 = 1 and (by reordering the coordinates) that \w\\ > \w2\ > ... > \w n \. Note that 

def 

the Euclidean distance of any point x G W 1 from H is \w ■ x — 9\. Let us also define V = H fl U. Set 

def 

r = a/4 (for conceptual clarity we will continue to use "r" for as long as possible in the arguments below). 
We consider the r-critical index c(w, r) of the vector w G W 1 and proceed by case analysis based on its 

value. Fix the parameter K = 6 ((1/t 2 ) • (loglog(l/K) + log(l//3))) . 
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Case I: c(w,t) > Kq. In this case, we partition [n] into a set of "head" coordinates H = [Kq] and a 
complementary set of "tail" coordinates T = [n] \ H. Writing w as (wh,wt) and likewise for x, it follows 
from Fact|25]that \\w T \\ < 0(/3/^/log(l/re)). By the Hoeffding bound, for fraction of x G {-1,1}" 

we have that \wt • xt\ < /3. Therefore, for (1 — k) fraction of x G {—1, 1}" we have 

\wh • %h — 9\ <\w ■ x — 9\ + \wt • xt\ < \w • x — 9\ + /3. 



By the assumption of the lemma, there exists a set S C {—1, l} n of cardinality at least a ■ 2 n such that 
for all x G 5 we have |w • x — 9\ < /3. A union bound and the above inequality imply that there exists a set 
S* C 5 of cardinality at least (a — re) • 2 n with the property that for all x G S*, we have 

|u>H • En - 6»| < 2/3. 

Also, any x G [7 satisfies ||xt|| < 1- Hence for any x G V, we have that 

l^i/ • %H — G\ < |^ - X — 9\ + |u>T • %t\ = - %t\ 

< \\w T \\ ■ \\x T \\ < 0(/9/Vlog(l/«)) < /3. 

Define the projection mapping 0// : W 1 — > W H \ by 4>h '■ x h-> x# and consider the image of 5*, i.e. 
5' = H (5*). It is clear that \S'\ >(<*-«)• 2l H l and that for all x H G 5', we have 

\w H -x H -9\ < 2/3. 

Similarly, if V is the image of V under <^>#, then for every xh G V we have \wjj • xh — 0\ < /3. It is also 
clear that \\wt\\ < 1/2 and hence \\wjj\\ > 1/2. Thus for every xh G (5" U V) we have 



w H -x H 9 



\wh\\ \\wh\ 



< 4/3. 



We now define the Kq -dimensional hyperplane H# as H# == {xh G M} h \ \ wh • = As all 
points in 5' U V are in the 4/3-neighborhood of Hh, we may now apply Theorem [45] for the hyperplane 
Hh over Rl ff l to deduce the existence of an alternate hyperplane H'^ = {xh G W h \ \ vh • xh = v} that 
contains all points in S'UV'. The only condition we need to verify in order that Theorem |45] may be applied 
is that 4/3 is upper bounded by 



a — k 



In the following C±, C2, etc. denote unspecified absolute positive constants. Using k < a/2, it suffices to 
ensure 

/3 < (a/K ) Cl(log(JWa)) . 
Recalling that r = a/4 and plugging in the value of Kq in terms of a, k and /3, we need to verify that 

>3 x C 2 (log(l/Q 3 )+log(loglog(l/ K )+log(l//3))) 



a 



,loglog(l/re)+log(l//3). 
Using Fact[l4j we get that the right hand side is lower bounded by 

a C 3 log(l/a) . (lo g l og (l/ K ) +log ( 1 / j g))-C3log(loglog(l/«)+log(l/^)) - 
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Using Fact [T5l we get that the above expression is lower bounded by 

a C 4 log(l/a) . logl og (l/ K )- C 4logloglog(l/ K ) . log ^^-C 4 loglog(l/^)_ 

Thus it suffices to verify that 

(3 < a C4log(l/a) . loglog^/K)- '* 1 ** 10 ^^ 1 /*) . \og(l/p)- C * lo Z l °zW). 

It is easy to see that for 

(3 < a°^ l ' a » • loglog(l/,<r 0(logloglog(1/K)) 

(with sufficiently large constants inside the O(-) notation), the above inequality is indeed true and hence it 
is true for ft < (3q. 

Thus, we get a new hyperplane H^- Q = {xh G W h \ \ vh • xh = ^} that contains all points in S' U V. 
It is then clear that the re-dimensional hyperplane H' = {x G W 1 \ vr • %h = ^} contains all the points 
in S* = ((/>h) _1 (5' / ) and the points in V, and that the vector vh defining H' has the claimed number of 
nonzero coordinates. So the theorem is proved in Case I. 

Case II: c(w, r) < Kq. In this case, we partition [n] into "head" and "tail" based on the value of c(w, r) by 
taking H = [c(w, r)\ and T = [n] \ H. We use the fact that wt is r-regular to deduce that the norm of the 
tail must be small. 

Claim 26. We have \\w T \\ 2 < 2(3/(a - 3r) = 8/3/a. 
Proof. Suppose for the sake of contradiction that 

\\wrh > 2/3/(a - 3r). 
By the Berry-Esseen theorem (Theorem [12] or more precisely Fact[T3l. for all 5 > we have 

25 

sup t6R Pr XT [\w T -x T -t\ < 5] < rr + 2r. 

1 1 W T 1 1 

def 

By setting 5 = (a - 3t)||u>t||/2 > /3 we get that 

sup tgR Pr XT [\w T ■ x T - t\ < 5} < a, 

and consequently 

Pr^flw • x — 9\ < (3] < swp~Pr XT [\u)T • xt — t\ < j3] 

< supPr XT [\wt • xt — t\ < 5] 

< a 

which contradicts the existence of the set 5 in the statement of the lemma. □ 

The rest of the proof proceeds similarly to Case I. By the Hoeffding bound, for 1 — k fraction of 
x G {—1, l} n we have 

\wr ■ xh — 0\ < \w • x — 8\ + (3' 

where f3' = O [(p/a) ■ -y/log(l/K)^ . By the assumption of the lemma and a union bound, there exists a 
set S* C S of cardinality at least (a — k) ■ 2 n with the property that for all x G S* we have 

\w H -x H -6\ <(3' + f3. 



13 



Turning to V, for every point x G V we have that \wh-%h— 0\ < \w-x— 6\ + \wt-xt\ = I wt-xt I- For x G V 
the value wt • x<r is either (if x = 0) or is (iot)i (if x = e^) for some i G T. Since is r-regular we have 
IOt)*! <r-||w r || < (a/4) ■ (8/9/a) = 2/3, so for every x G 1/ we have \w H ■ x H - 6\ < 2/3 < + /3'. 

As before, we define the projection mapping : R" — > Rl-^l by : x i-)- We let S' == 4>h(S*) 
and V = ^fl-(V). It is clear that \S'\ > (a - k) ■ 2^ H \ and that for all xh G (5' U V') we have 

\w H -x H -0\ <P' + /3. 

and that for all xr G V , \wh ■ xr — 0\ < P- We now define the | H \ -dimensional hyperplane Hh as 

{x H G Rl^l | wh ■ xr = 0}. As before, we note that ||u; T || < 1/2 and hence \\w H \\ > 1/2. 
Hence, every point x H G S' U V" is 2(/3 + (3') < 4(3' close to H H . As all points in S' U V are 4/3' 
close to H#, we may now apply Theorem [45] over Rl^l to deduce the existence of an alternate hypeiplane 

= f {xh G R' h ' I vh • xr = v\ that contains all points in S' and V . The only condition we need to 
verify is that 4/3' is at most 

( — ^— • i^is+UogdHi/^))] . ( 2 + \i og n H \u a _ K ))\)l) . 
\a — k ) 

As /3' = 0(((S-\/log(l /n))/a), doing a calculation akin to the calculation in Case I (now using \H\ < Kq) 
we get that the above inequality is true for 

P < (log(l/ K ))- 1/2 • a o(^g(iM) . logl g(l//<r ( Io s lo s I °s(V«)) 

as long as the constant inside the O(-) notation are sufficiently large. (It is instructive to note here that it 
is Case II which is the "bottleneck" for our overall bound, in the sense that we require a stronger upper 
bound on (3 for Case II than for Case I.) It is now clear that the n-dimensional hyperplane H' = {x G R n | 
vh • xh = v\ contains all the points in 5* = (c/>#) -1 (S") and the points in V, and has the claimed number 
of nonzero coordinates. This proves the Lemma in Case II and concludes the proof of Lemma l22l 

5.3 Proof of Theorem |7J As mentioned in the body of the paper, our proof is essentially a refined version 
of Theorem 4 of HG0IO6I with two main modifications: one is that we generalize Goldberg's arguments to 
allow g to be a bounded function rather than a Boolean function, and the other is that we get rid of various 
factors of \Jn which arise in the HG0IO6I argument (and which would be prohibitively "expensive" for us). 
The key to getting rid of these factors is the following simple lemma: 

Lemma 27. Let S C {-1, l} n an d W : S -> [0, 2] such that Y, xeS W (x) = 52 n . Also, let v G R n have 
\\v\\ = 1. Then 

£ W(x) -\vx\ = 0(Sy/log(l/S)) ■ 2 n . 

xeS 

Proof. For any x G 5, let D{x) = W(x) / (X^es W(x)). Clearly, D defines a probability distribution over 
S. By definition, B x ^ D [\v ■ x\] = E ieS W(x) • \v ■ 3?l)/(S ?&g W(x) ). Since ^ gS W(x) = 6 ■ 2 n , to 
prove the lemma it suffices to show that E x ^£)[|t; ■ x\] = 0(y/log(l/5)). Recall that for any non-negative 
random variable Y, we have the identity E[Y] = f t>0 Pr[Y > t] dt. Thus, we have 

E x ~d[\v-x\]= Pr x „ D [\v ■ x\ > t] dt. 
Jt>o 

To bound this quantity, we exploit the fact that the integrand is concentrated. Indeed, by the Hoeffding 
bound we have that 

Pr^ { -i,i}n[b-x| >t] <2e"* 2 / 2 . 
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This implies that the set A = {x G {— 1, l} n : \v ■ x\ > t} is of size at most 2e~* 2 ' /2 2 n . Since W(x) < 2 for 
all x G 5, we have that X^ x6A ns n; ( 2; ) ^ 4e~ i2 / 2 2 n . This implies that Piw?[|u • x\ > t] < (4/5) ■ e~* 2 / 2 . 
The following chain of inequalities completes the proof: 



/V2MV<5) r 

. D [|w-x|] = / Pr^ D [|u; -x| > t] (it + / Pr x ~£>[|v ■ x\ > t] dt 

Jt=0 Jt>\Jl ln(l/(5) 

< i/2hi(l/<5) + / Pr x ~x»[|w -x| > t] (it 

Jt>^/2 \n(l/S) 



4e -* 2 /2 



< V2MV^) + / — x — dt 

Jt>y/2hx(l/S) ° 

< v/2m(l/J) + / dt = y/2]n{l/5)+4. 

Jt>y/2 ln(l/5) d 

□ 

We are now ready to prove Theorem |7] 

Proof of Theorem^ Let / : {-1, l} n -»• {-1, 1} be an LTF and g : {-1, l} n -> [-1, 1] be an arbitrary 

bounded function. Assuming that dist(/, g) = e, we will prove that dchow(/j = ^( e ) == e ^ ^ 1 / 6 )). 

Let us define V + = {x G {-1,1}™ | /(a;) = l,£?(x) < 1} and V- = {x e {-l,l} n | f(x) = 
— l,g(x) > —1}. Also, for every point x G {—1, l} n , we associate a weight W(x) = — g(x)\ and for 

a set 5, we define W(S) = J2 x eS W ( x )- 

It is clear that V+U V- is the disagreement region between / and g and that therefore W(V+)+W(V_) = 
e • 2 n . We claim that without loss of generality we may assume that (e — 5) • 2™" 1 < W(V + ), W(V_) < 
(e + 5) • 2 n ~ 1 . Indeed, if this condition is not satisfied, we have that |/(0) — 5(0) | > 5 which gives the 
conclusion of the theorem. 

We record the following straightforward fact which shall be used several times subsequently. 

Fact 28. For W as defined above, for all X C {-l,l} n , \X\ > W(X)/2. 

We start by defining V+ = V+, V® = V- and V° = V? U V2. The following simple proposition will 
be useful throughout the proof, since it characterizes the Chow distance between / and g (excluding the 
degree-0 coefficients) as the (normalized) Euclidean distance between two well-defined points in R n : 

Proposition 29. Let n + = Yl x eV+ ' x and l 1 - = ExeV- ' x - Then ££=i(/(*) - 9{i)Y = 

9-2" II,, ,, ||2 

1 HAH — AM • 

Proof. For i G [n] we have that f(i) = E[f(x)xi] and hence f(i) — g(i) = E[(/(x) — g(x))xj\. Hence 

2 n (/(«) - g(i)) = • x i ~ w ( x ) • x i = (P+ - /■»-) • e i where (P+ ~ M-) • e i is the 

inner product of the vector [i + — \i- with the unit vector e^. Since ei, . . . , e n form a complete orthonormal 
basis for M. n , it follows that 

||^ + -^|| 2 = 2 2 "^(/(i)-?W) 2 

ie[n] 

proving the claim. □ 

If r\ G R n has = 1 then it is clear that — //_|| > — • 77. By Proposition [29] to lower 
bound the Chow distance dchowif, 9), it suffices to establish a lower bound on (//+ — //_) • 7/ for a unit 
vector 77 of our choice. 
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Before proceeding with the proof we fix some notation. For any line £ in ]R n and point x G W 1 , 
we let £{x) denote the projection of the point x on the line I. For a set X C W 1 and a line £ in W 1 , 

def ^ 

^(X) = {£(x) : x G X}. We use £ to denote the unit vector in the direction of £ (its orientation is irrelevant 
for us). 

Definition 30. For a function W : { — 1, l} n — » [0, oo), a set X C {—1, 1}™ z's raid ?o Z?e (e, z/)-balanced jj 

(e - i/)2 n " 1 < E^gx W(x) < (e + v)2 n ~ 1 . 

Whenever we say that a set X is (e, z/)-balanced, the associated function W is implicitly assumed to be 
the one defined at the start of the proof of Theorem [7] The following proposition will be very useful during 
the course of the proof. 

Proposition 31. Let X\,X 2 C {—1, l} n be {e,v)-balanced sets where v < e/8. Let £ be a line in W 1 
and q £ £ be a point on £ such that the sets £(X\) and £{X 2 ) lie on opposite sides of q. Suppose that 

S = {x\xeX 1 U X 2 and \\£{x) - q\\ > /?}. //E*-eS W (x) > j2 n , then for fij = £ xgXl W(x) ■ x and 
fJ>2 = Yjx£X 2 " x > we nave 



In particular, for 1/^/2 ln(16/e) < 07/2, we have |(/ii - /i 2 ) ■ t\ > (Pj/2)2 n . 

Proof. We may assume that the projection £{x) of any point x G ATi on £ is of the form g + A x £ where 
A x . > 0, and that the projection £{x) of any point x G X 2 on £ is of the form g — A^ where X x > 0. We can 
thus write 

(lH-to)-? = E W(x)(q-t+X x )- E W(x)(g-?-A x ) 

ieXi xex 2 

= (W(X 1 )-W(X 2 ))q-£ + E 

x€X 1 UX 2 

By the triangle inequality we have 

(mi-M2)^> E w^-A^-ig-^Kw^o-w^))) 

so it suffices to bound each term separately. For the first term we can write 

E W(x) • A x > E VW(x) • A x > /3 7 2 n . 

xGXiUJf 2 x£S 

To bound the second term, we first recall that (by assumption) |W(Xi) — W(X2)| < v2 n . Also, we claim 
that \q ■ £\ < y2 ln(16/e). This is because otherwise the function defined by g(x) = sign(x • £ — q • £) 
will be e/8 close to a constant function on { — 1, l} n . In particular, at least one of \X 2 \ must be at most 
(e/8)2 n . However, by FactEU for i = 1,2 we have that \X { \ > W(Xj)/2 > (e/4 - u/A)2 n > (e/8)2 n 
resulting in a contradiction. Hence it must be the case that \q ■ £\ < a/2 m(16/e). This implies that 
I (^1 — fi 2 ) ■ £\ > (/?7 — V\j2 m(16/e))2 n and the proposition is proved. □ 



We consider a separating hyperplane Ao for / and assume (without loss of generality) that Ao does not 
contain any points of the unit hypercube { — 1, l} n . Let Ao = {x G W 1 | w ■ x = 9}, where \\w\\ = 1, 9 G R 
and f(x) = sign(?i> • x — 9). 

Consider a line £0 normal to Ao, so w is the unit vector defining the direction of £0 that points to 
the halfspace / _1 (1). As stated before, the exact orientation of £q is irrelevant to us and the choice of 
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orientation here is arbitrary. Let qo E W l be the intersection point of £q and Ao- Then we can write the line 

£ as £q = {p G M n | p = q + \w, A G M}. 

Define /? = e°( 1 °s( 1 / e )) and consider the set of points 

S = {x : x G F° | - g || > /?}■ 

The following claim states that if W(Sq) is not very small, we get the desired lower bound on the Chow 
distance. 

Claim 32. Suppose that W(S ) > 7o ■ 2 n where 7o ^/3 41 °g(iA)-2 . e 77^ d Chow (f,g) > 5. 

Proof. To prove the desired lower bound, we will apply Proposition |29] Consider projecting every point in 
V° on the line £q. Observe that the projections of V+ are separated from the projections of V2. by the point 
go- Also, we recall that the sets Vf and V® are (e, 5) balanced. Thus, if we define /i+ = J2 x ev° W(x) -x and 

/x_ = y^ lcr o VyfxVx. wecanapply Proposition l3T1to get that fi-)-iv\ > (/37o— <^\/2 ln(16/e))2 n > 

52 n . This implies that — H-\\ 2 > 5 2 2 2n and using Proposition |29j this proves that dchow(/, <?) > <5- D 

If the condition of Claim[32]is not satisfied, then we have that W(V° \ S ) > (e - 7 )2 n . By Fact|28l 
we have \V° \ Sq\ > (e — 7o)2 n_1 . We now apply Lemmal22lto obtain another hyperplane Ai which passes 

through all but K\ ■ 2 n points (k\ = f 7 o/2) in V° \ Sq. We note that the condition of the lemma is satisfied, 
as log(l/Ki) = poly(log(l/e)) and \V° \ S Q \ > (e/4) • 2 n . 

From this point onwards, our proof uses a sequence of [k>g(l/e)J cases. To this end, we define jj = 
/ g4iog(i/e)-2(j+i) . e ^ t beginning of case j, we will have an affine space Aj of dimension n — j such 

that W(V° n Aj) > (e - 2(^=o 7^)) 2 ™- We note ^ at tnis is indeed satisfied at the beginning of case 1. To 
see this, recall that W(V° \ S ) > (e - 7o)2 n . Also, we have that 

W((V°\S )\(V°nA 1 )) < 2\(V°\S )\(V°nA 1 )\ 

< 2 Kl 2 n = 7o 2". 

These together imply that W(V° n Ai) > (e — 27o)2 n confirming the hypothesis for j = 1. 

We next define V' = V° n Aj, V{ = V j n V + and Vi = V j n V-. Similarly, define = \ V{ 
and Ai = V°\Vl. Let A' j+1 = Aj n A . Note that Aj % A . This is because Aj contains points 
from {—1, 1}™ as opposed to Ao which does not. Also, Aj is not contained in a hyperplane parallel to Ao 
because Aj contains points of the unit hypercube lying on either side of Ao- Hence it must be the case that 
dim(^ +1 ) = n — (j + 1). Let ij be a line orthogonal to A'- +l which is parallel to Aj. Again, we observe 
that the direction of ij is unique. 

We next observe that all points in Aj +1 project to the same point in Ij, which we call qj. Let us define 

= £j(Vl) and Ai = £j(Vi). We state the following important observation. 
Observation 33. The sets A J + and Ai are separated by qj. 
Next, we define Sj as : 

Sj = {xe V j | ||*i(s)-<&||2>/8}. 

The next claim is analogous to Claim [32] It says that if W(Sj) is not too small, then we get the desired 
lower bound on the Chow distance. The proof is slightly more technical and uses Lemma l27l 

Claim 34. For j < log(8/e), suppose that W(Sj) > 7j-2 n where 7j - is as defined above. Then c?chow(/> g) > 
5. 
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Proof. We start by observing that 



j'-i 



n-l 



h - 4 E 7*J 2 n_1 < W(^i) < (e + ,5)2 

The upper bound is obvious because V+ C and Vi C Vj? and the range of W is non-negative. To see 

the lower bound, note that W(V° \ V j ) < 2(E£J 7^)2 n - As V? \ V{ and V° \ Vi are both contained in 
V° \ Vi , we get the stated lower bound. We also note that 



O'-i 



2 ^ 7< 2" = 2 ^^logCl/e)^ 2 n 



^=0 



< 4 / g 41 °g( 1 / e )^ 2 i2 ri . 



This implies that the sets V+ and Vl are (e, 4/3 41og ( 1//<: ) 2j + 5) balanced. In particular, using that 5 < 
4/3 4iog(i/e)-2j ) W£ can say that the sets y£ and yi_ we ^ 8/3 4 log(i/e)-2j )_balanced. We also observe 

that for j < log(8/e), we have that 8^ lo ^^~ 2j < e/8. Let us define fi j + = E x6 yJ W ( x ) ' x and 

fj_ = E xe yJ W(x) • x. An application of Proposition |3T1 yields that — //_) ■ £j\ > (/?7j — 

8/3 41 °s( 1 ^)-2i v / 2 ln(16/e))2 n . 

We now note that 



(ah -/*-) = (m+-a^-) 



+ 



Defining /i+ = E^a 3 ' 21 and A*- = E^a 3 ' x > tne triangle inequality implies that 

(ah-a*-)-?- > 

Using Lemma|27]and that W(Ai), W(Ai) < W(V° \ V j ) < 8/3 41o §( 1 /^)-2i . 2 * we get that 



j i 
A*+ - Ai- 



4^ 



E W(x) ■ x • ^ 

xGAi 



0(jA J + |.^log(2«/|A J + 

^41og(l/e)-2,-. log 3/2 (1/e) . 2 n 



and similarly 



E w(x)-x) 



xeA J _ 



O MAi| • ^/log(2"/|Ai 

(p*tog(l/e)-V .l og 3/2 (1/e) . 2 r 
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This implies that 

> (/3 7j - 8/3 41og ( 1 / £ )- 2 V21n(8/e))2 n 

-O ^log(l/ e )-2j . l og 3/2 (1/e) . 2 n^j _ 

Plugging in the value of jj, we see that for e smaller than a sufficiently small constant, we have that 

(/x+ -//-)• tj 

An application of Proposition [29] finally gives us that 

^Chow(/,5) > 2- n ||^+ - M-ll > 2 " n (^+ " M-) • *j = /57i/2 > 5 
which establishes the Claim. □ 

If the hypothesis of Claim [34] fails, then we construct an affine space of dimension n — j — 1 
such that W{V° n A j+ i) > (e - 2^L 7<?)2 n as described next. We recall that U = Uf =1 e, U 0. It is 
obvious there is some subset Yj C U such that |Y^| = j and span(A,- U Yj) = IR n . Now, let us define 

= span(y i U A' j+1 ). Clearly, is a hypeiplane and every point x £ (V° n A/) \ 5j is at a distance 
at most /3 from i/j. This is because every x € (V° n A,-) \ <Sj is at a distance at most f3 from ^4j +1 and 
A' j+1 C H^ . Also, note that all x £ Yj lie on H^ . 

Note that W((V° n A,-) \ Sj) > (e - 2 ^=o 7<? ~ Tj') 2 ™ As prior calculation has shown, for j < 
log(8/e) we have W((V° n Ay) \ Sj) > (e - 2 7^ ~ 7i)2 n > (e/2)2 n . Using Fact[28] we get that 
|(V° n Aj) \ Sj\ > (e/4)2 n . Thus, putting kj = jj/2 and applying Lemma l22l we get a new hypeiplane 
Hj such that |((V° n Aj) \ Sj) \ (ttj n V°)\ < (jj/2) ■ 2 n . Using that the range of W is bounded by 2, we 
get W(((V° n A,-) \ 5j) \ (Hj n F )) < 7j • 2 n . Thus, we get that W(Hj nV°n Aj) > (e - 2 ^ =0 7 ^)2 n . 
Also, lj C Hj. 

Let us now define A j+ i = Aj n Hj. It is clear that W(A/+i n V°) > (e - 2X^ =0 7<?)2 n . Also, 
dim(A, + i) < dim(Aj). To see this, assume for contradiction that dim(Aj) = dim(Aj+i). This means that 
Aj C Hj. Also, C Hj. This means that span(Aj UYj) C Hj. But span(A,- U Yj) =R n which cannot 
be contained in Hj. Thus we have that dim(A, + i) = dim(A,) — 1. 

Now we observe that taking j = |_log(8/e)J, we have a subspace Aj of dimension n — j which has 
W(AjnV°) > (e-2Y^izlii)2 n > (e/2)2 n . By Fact [28] we have that |A,-nl/°| > (e/4)2 n . However, by 
Fact[[6] a subspace of dimension n — j can contain at most 2 n ~i points of {—1, l} n . Since j = |log(8/e)J, 
this leads to a contradiction. That implies that the number of cases must be strictly less than [log(8/e)J . In 
particular, for some j < [log(8/e)J , it must be the case that > 7j2 n . For this j, by Claim[34] we get a 
lower bound of 8 on dchow(Z) 9)- This concludes the proof of Theorem [7] □ 

6 The Algorithm and its Analysis 

6.1 Algorithm and Proof Overview. In this section we give a proof overview of Theorem [TOl restated 
below for convenience. We give the formal details of the proof in the following subsection. 

Theorem HOl (Main Algorithmic Result). There exists a randomized algorithm ChowReconstruct that 
for every Boolean function f : { — 1, l} n — > {— 1, 1}, given e > 0, 5 > and a vector a = (ckq, ol\, . . . , a n ) 
such that \\xf — <3|| < e, with probability at least 1 — 5, outputs an LBF g such that \\xf — Xg\\ — 6e. The 



> Mj2 
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algorithm runs in time 0(n 2 e 4 log (1/5)). Further, g is represented by a weight vector kv G where 
k61 and v is an integer vector of length \\v\\ = 0(-y/n/e 3 ). 

We now provide an intuitive overview of the algorithm and its analysis. Our algorithm is motivated by 
the following intuitive reasoning: since the function uq + Ylie[n] ai ' Xi nas tne desired Chow parameters, 
why not just use it to define an LBF g\ as P\ (ao + Ylie[ n ] a i' x i)^ The answer, of course, is that as a result 
of applying the projection operator, the Chow parameters of g\ can become quite different from the desired 
vector a. Nevertheless, it seems quite plausible to expect that g\ will be better than a random guess. 

Given the Chow parameters of g\ we can try to correct them by adding the difference between a and x gi 
to the vector that represents g\ . Again, intuitively we are adding a real-valued function hi = ao — g\ (0) + 
X)ie[n] ( a * ~~ 3i W) ' x i w i tn tne Chow parameters that we would like to add to the Chow parameters of g\. 
And, again, the projection operation is likely to ruin our intention but we could still hope that we got closer 
to / and that by doing this operation for a while we will converge to an LBF with Chow parameters close to 
a. 

While this idea might appear too naive, this is almost exactly what we do in ChowReconstruct. 
The main difference between this naive proposal and our actual algorithm is that at step t we actually add 
only half the difference between a and the Chow vector of the current hypothesis Xg t ■ This is necessary in 
our proof to offset the fact that a is only an approximation to Xf an d we can only approximate the Chow 
parameters of g t . An additional minor modification is required to ensure that the resulting weight vector is 
a multiple of an integer weight vector of length 0(^/n/e 3 ). 

Proving the correctness of this algorithm roughly proceeds as follows. If the difference vector is suf- 
ficiently large (namely, more than a small multiple of the difference between — a\\) then the linear 
function h t defined by this vector can be easily seen as being correlated with / — g t , namely E[(/ — gt)ht] > 
c \\Xgt ~ <5|| 2 f° r a constant c > 0. As was shown in HTTV091 and HFellOl this condition for a Boolean h t 
can be used to decrease a simple potential function measuring E[(/ — gt) 2 ], the l\ distance of the current 
hypothesis to /. One issue that arises is this: while the l\ distance is only reduced if h t is added to gt, in 
order to ensure that g t +\ is an LBF, we need to add the vector of difference (used to define ht) to the weight 
vector representing g t . To overcome this problem the proof in HTTV091 uses an additional point- wise count- 
ing argument from [ Imp95) . This counting argument can be adapted to the real valued h t , but the resulting 



argument becomes quite cumbersome. Instead, we augment the potential function in a way that captures the 
additional counting argument from [ Imp95] and easily generalizes to the real- valued case. 



6.2 Proof of Theorem [TOj We build g through the following iterative process. Let g' = and let go = 

Pi(g' ). Given g t , we compute the Chow parameters of gt to accuracy e/(4y/n + 1) and let (/3o, Pi, ... , /3 n ) 
denote the results. For each < i < n we define g t (i) to be the closest value to that ensures that ai — pi is 
an integer multiple of e/ (2y/n + 1). Let x.gt = idt (0), • • ■ , <7t( n )) denote the resulting vector of coefficients. 
Note that 



\Xgt Xgt 



< 



n 



Je/(2V^TT)) 2 = e/2. 

\ i=0 



If p = \\a — Xg t II ^ 4e then we stop and output g t . By triangle inequality, 

||X/-Xgtll < \\Xf-u\\ + \\u-Xgt\\ + \\Xgt-X9t\ 
< e(l + 4+l/2) <6e, 

in other words g t satisfies the claimed condition. 

Otherwise (when p > 4e), let g' t+l = g' t + h t /2 and g t +i = Pi(g' t+l ) for 



h t = ^^{oii - g t (i))xi. 



=o 
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Note that this is equivalent to adding the vector (a — X 9t )/2 to the degree and 1 Fourier coefficients of g' t 
(which are also the components of the vector representing g t ). 

To prove the convergence of this process we define a potential function at step t as 

E{t) = E[(f- gt f] + 2E[(f-9t)(9t-9't)} 
= B[(f-g t )(f-2g' t + 9t )}. 

The key claim of this proof is that 

E(t + 1) - E(t) < -2e 2 . 



To prove this claim we first prove that 



B[(f-g t )h t ]>p(p-^e). (1) 



To prove equation CO we observe that, by Cauchy-Schwartz inequality, 

n 

n(f-gt)h t ] = £(/(*) - &(0)(a< - m) 

n 



In addition, by Parseval's identity, 



i=0 

(g~t(i) - 9t(i))(ai - g t {i)) + (a; - g t (i)f 
> -pe-pe/2 + p 2 >p 2 -^pe. 



B[h 2 ] = ^T(a l -g~ t ( l )) 2 =p 2 . (2) 

i=0 



Now, 



E(t + 1)-E{t) = E[(f-g t+1 )(f-2g' t+1 +g t+1 )]-E[(f-g t )(f-2g' t + g t )} 
= E [(/ — g t )(2g' t - 2g' t+l ) + (g t+1 - g t )(2g' t+1 - g t - g t+1 )] 
= -B[(f-g t )h t ]+E[(g t+1 -g t )(2g' t+1 -g t -g t+1 )] (3) 

To upper-bound the expression E [(<7t+i — gt)(2g' t+1 — gt — gt+i)] we prove that for every point x E 
{-M} n , 

Ot+iO) - g t {x))(2g' t+1 (x) - g t {x) - g t +i{x)) < h t (x) 2 /2. 

We first observe that 

(x) - gt(x)\ = \Pi(g' t (x) + h t (x)/2) - Pi(g' t (x))\ < \h t (x)/2\ 
(a projection operation does not increase the distance). Now 

\ 2 9t+\{x) -gt(x) -g t+1 (x)\ < \g' t+1 (x) - g t (x)\ + \(g' t+1 (x) - g t+1 (x)\. 

The first part \g' t+1 (x) - g t (x)\ = \h t (x)/2 + g' t (x) - g t (x)\ < \h t (x)/2\ unless g' t (x) - g t (x) / and 
g' t {x) — gt(x) has the same sign as ht(x). By the definition of Pi, this implies that = sign(g' t (x)) 

and sign(/i t (x)) = sign(^(x) - g t {x)) = g t (x). However, in this case \g' t+1 {x)\ > \g' t {x)\ > 1 and 
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sign(^ + i( x )) = sign(^(x)) = g t (x). As a result g t +i(x) = g t (x) and (g t +i(x) - g t (x))(2g' t+1 (x) - 
gt(x)—gt+i(x)) = 0. Similarly, for the second part: \g' t+1 (x)—gt+i(x)\ > \ht(x)/2\ implies that g t+ i (x) = 
sign(^ +1 (x)) and \g' t+1 (x)\ > \h t (x)/2\ + 1. This implies that \g' t {x)\ > \g' t+l (x)\ - \h t (x)/2\ > 1 and 
g t {x) = sign(g^(x)) = sign(g( +1 (x)) = g t +\{x). Altogether we obtain that 

(g t+1 (x) - g t (x))(2g' t+1 (x) - g t (x) - g t+1 (x)) < max{0,\ht(x)/2\(\h t (x)/2\ + \h t (x)/2\)} = h t (x) 2 /2. 
This implies that 

E [(g t +i - gt)(U+i -at- m+i)] < E[/i 2 ]/2 = P 2 /2. (4) 

By substituting equations £T|) and (01) into equation ([3]), we obtain the claimed decrease in the potential 
function 

E(t + 1) - E{t) < -p 2 + ^pe + p 2 /2 = -(p - 3e)p/2 < -2e 2 . 
We now observe that 

E(t) = E[(/ - g t ) 2 } + 2E[(/ - g t )(gt - g' t )] > 

for all t. This follows from noting that for every x and f(x) G { — 1, 1}, if gt{x) — gi(x) is non-zero then, by 
the definition of P\, gt{x) = sign(^(x)) and sign^^x) — g[{x)) = —gt(x). In this case, f(x) — gt{x) = 
or sign(/(x) - g t {x)) = -g t (x) and hence (f(x) - g t (x))(g t (x) - g' t (x)) > 0. Therefore 

E[(f-g t )(g t -g' t )}>0 

(and, naturally, E[(/ — gt) 2 ] > 0). It is easy to see that E(Q) = 1 and therefore this process will stop after 
at most l/(2e 2 ) steps. 

We now establish the claimed weight bound on the LBF output by the algorithm and the bound on the 
running time. Let T denote the number of iterations of the algorithm. By our construction, the function 
gr = Pl(52t<T ht/2) is an LBF represented by weight vector w such that Wi = Sj<r( a « — 5j 
Our rounding of the estimates of Chow parameters of gt ensures that each of (pn — g~j(i))/2 is a multiple 
of k = e/(4y/n + 1). Hence gx can be represented by vector w = kv, where vector v has only integer 
components. At every step j, 



J2(<*i - 9j(i)) 2 < 2 + e + e/2 = O(l). 

i=0 

Therefore, by triangle inequality, \\w\\ = 0(e~ 2 ) and hence \\v\\ = \\w\\/k = 0{y/n/e^). 

The running time of the algorithm is essentially determined by finding x gt in each step t. Finding Xg t 
requires estimating each g t (i) = E[g t (x) ■ Xj\ to accuracy e/(4y/n + 1). Chernoff bounds imply that, by 
using the empirical mean of gt(x) • Xj on 0((n/e 2 ) • log (n/ (ed)) random points as our estimate of gi{i) we 
can ensure that, with probability at least 1 — 5, the estimates are within e/(4\/n + 1) of the true values for 
all n + 1 Chow parameters of g t for every t < T = 0{e~ 2 ). 

Evaluating g t on any point x G {—1, l} n takes 0(n) time and we need to evaluate it on 0((n/e 2 ) ■ 
log (n/(eJ)) points in each of 0(e~ 2 ) steps. This gives us the claimed total running time bound. 

7 The Main Results 

7.1 Proofs of Theorems Hand |2j In this subsection we put the pieces together and prove our main results. 
We start by giving a formal statement of Theorem [T] 



\ 
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Theorem 35 (Main). There is a function rc(e) = 2 _ °( log3 ( 1 / e )) such that the following holds: Let f : 
{ — 1, l} n —> {—1,1} be an LTF and let < e,6 < 1/2. Write xj for the Chow vector of f and assume that 
a G M n+1 is a vector satisfying \\a— Xf\\ < K ( e )- Then, there is an algorithm A with the following property: 
Given as input a, e and 5, A performs 0(n 2 ) • poly(l/ft(e)) • log(l/<5) bit operations and outputs the 
(weights-based) representation of an LTF f* which with probability at least 1 — 5 satisfies dist(/, /*) < e. 

Proof of Theorem \35\ Suppose that we are given a vector a G M n+1 that satisfies A := ||a — Xf\\ — K ( e )> 
where / is the unknown LTF to be learned. To construct the desired /*, we run algorithm ChowReconstruct 
(from Theorem [Toll on input a. The algorithm runs in time poly(l/A) -0(n 2 ) -log(l/5) and outputs an LBF 
g such that with probability at least 1 — 5 we have drjhow(/, d) < 6A < 6/t(e). (We can set the constants ap- 
propriately in the definition of the function «(e) above, so that the quantity on the RHS of the latter relation is 
smaller than the "quasi-polynomial" quantity we need in the main structural theorem, so that the conclusion 
is "dist(/, g) < e/2".) By Theorem[7]we get that with probability at least 1 — 5 we have dist(/, g) < e/2. 
Writing the LBF g as g(x) = P\{vq + Y17=i v i x i)' we now c l a i m that f*(x) = sign(-uo + J27=i v i x i) 
has dist(/, /*) < e. This is simply because for each input x G { — 1, l} n , the contribution that x makes to 
to dist(/, /*) is at most twice the contribution x makes to dist(/, g). This completes the proof of Theo- 
rem |35] □ 

As a simple corollary, we obtain Theorem [2] 

Proof of Theorem^ Let / : { — 1,1}™ — > {—1,1} be an arbitrary LTF. We apply Theorem 1351 above, for 
5 = 1/3, and consider the LTF /* produced by the above proof. Note that the weights V{ defining /* 
are identical to the weights of the LBF g output by the algorithm ChowReconstruct. It follows from 
Theorem [TOl that these weights are integers that satisfy Yl^=i v 1 = 0{n • A -6 ), where A = J)(«;(e)), and 
the proof is complete. □ 

As pointed out in Section fl~2l our algorithm runs in poly(n/e) time for LTFs whose integer weight is at most 
poly(re). Formally, we have: 

def 

Theorem 36. Let f = sign(^ n =1 WiXi — 9) be an LTF with integer weights Wi such that W = Ya=i \ w i\ = 
poly(n). Fix < e, 6 < 1/2. Write Xf for the Chow vector of f and assume that a G W l+1 is a vector 
satisfying \\a — < e/(12W). Then, there is an algorithm A' with the following property: Givenasinput 
a, e and 5, A' performs poly(n/e) • log(l/5) bit operations and outputs the (weights-based) representation 
of an LTF f* which with probability at least 1 — 5 satisfies dist(/, /*) < e. 

Proof. As stated before, both the algorithm and proof of the above theorem are identical to the ones in 
Theorem [35] The details follow. 

Given a vector a G K n+1 satisfying A := ||a — Xf\\ < e/(12W), where / is the unknown LTF, we 
run algorithm ChowReconstruct on input a. The algorithm runs in time poly(l/A) • (D(n 2 ) ■ log(l/<5), 
which is poly(n/e) • log(l/5) by our assumption on W, and outputs an LBF g such that with probability at 
least 1 — 5, dchowif, d) < 6A < e/(2W). At this point, we need to apply the following simple structural 
result of flBDJ+981 : 



Fact 37. Let f = sign(^" =1 WiXi — 0) be an LTF with integer weights Wi, where W = Ya=i \ Wi \> an< ^ 
g : { — 1, l} n —> [—1, 1] be an arbitrary bounded function. Fix < e < 1/2. If dchow(fi 9) < e/W, then 
dist(/,p) < e. 

The above fact implies that, with probability at least 1 — 5, the LBF g output by the algorithm satisfies 
dist(/, g) < e/2. If g(x) = P\(vq + Y!i=i v i x i)> we similarly have that the LTF f*(x) = sign(^o + 
Y17=i v i x i) has dist(/, /*) < e. This completes the proof. □ 
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7.2 Near-optimality of Theorem^ Theorem [7] says that if / is an LTF and g : {-1, l} n ->■ [-1, 1] 
satisfy d Ch ow(/, ff) < e then dist(/, g) < 2~ n ( VMVO). 

It is natural to wonder whether the conclusion 
can be strengthened to "dist(/, g) < e c " where c > is some absolute constant. Here we observe that no 
conclusion of the form "dist(/, g) < 2~ 7 ( e )" is possible for any function 7(e) = w(log(l/e)/ log log(l/e)). 
To see this, fix 7 to be any function such that 

7(e)=w(tog(l/e)/loglog(l/e)). 

If there were a stronger version of Theorem [7]in which the conclusion is "then dist(/, g) < 2 _7 ( e \" the 
arguments of Section 177X1 would give that for any LTF /, there is an LTF /' = sign(?; • x — v) such that 
Pr[/(x) ± f'(x)} < e, where each Vi € Z satisfies |^| < poly(n) • (i/ e )°(i°gi°g(iA)). Taking e = 
l/2 n+1 , this tells us that /' must agree with / on every point in { — 1, l} n , and each integer weight in the 
representation sign(t> • x — v) is at most 2°( nl °g n ). But choosing / to be Hastad's function from HHas941 , 
this is a contradiction, since any integer representation of that function must have eveiy \vi\> 2 n( - nlogn \ 

8 Applications to learning theory 

In this section we show that our approach yields a range of interesting algorithmic applications in learning 
theoiy. 

8.1 Learning threshold functions in the 1-RFA model. Ben-David and Dichterman MBDD98I intro- 
duced the "Restricted Focus of Attention" (RFA) learning framework to model the phenomenon (common in 
the real world) of a learner having incomplete access to examples. We focus here on the uniform-distribution 
"1-RFA" model. In this setting each time the learner is to receive a labeled example, it first specifies an index 
i € [n]; then an n-bit string x is drawn from the uniform distribution over {—1, l} n and the learner is given 
(xi, f(x)). So for each labeled example, the learner is only shown the i-th bit of the example along with the 
label. 

Birkendorf et al. ||BDJ + 98l asked whether LTFs can be learned in the uniform distribution 1-RFA model, 
and showed that a sample of 0(n ■ W 2 • log(^)/e 2 ) many examples is information-fheoretically sufficient 
for learning an unknown threshold function with integer weights Wi that satisfy ^ \ wi\ < W. The results 
of Goldberg HG0IO6I and Servedio lISerOTI show that samples of size ( n / e )0(i°g(™A)iog(iA)) and po i y ( n ) . 
2°( 1 A 2 ) respectively are information-fheoretically sufficient for learning an arbitrary LTF to accuracy e, but 
none of these earlier results gave a computationally efficient algorithm. HQS 111 gave the first algorithm for 
this problem; as a consequence of their result for the Chow Parameters Problem, they gave an algorithm 
which leams LTFs to accuracy e and confidence 1 — 5 in the uniform distribution 1-RFA model, running in 

2 2 ° (1/e ) ■ n 2 ■ log n ■ log(^) bit operations. As a direct consequence of Theorem [TJ we obtain a much more 
time efficient learning algorithm for this learning task. 

Theorem 38. There is an algorithm which performs 0(n 2 ) ■ (l/e)°( log ( 1 / <E )) • log(i) bit-operations and 
properly learns LTFs to accuracy e and confidence 1 — 5 in the uniform distribution 1-RFA model. 

8.2 Agnostic-type learning. In this section we show that a variant of our main algorithm gives a very fast 
"agnostic-type" algorithm for learning LTFs under the uniform distribution. 

Let us briefly review the uniform distribution agnostic learning model [ KSS94I in our context. Let / : 

def 

{ — 1, 1}™ — > {—1, 1} be an arbitrary boolean function. We write opt = dist(/, %) = min^g-^ Pr x [h(x) ^ 
f(x)], where % denotes the class of LTFs. A uniform distribution agnostic learning algorithm is given uni- 
form random examples labeled according to an arbitrary / and outputs a hypothesis h satisfying dist(/i, /) < 
opt + e. 



24 



The only efficient algorithm for learning LTFs in this model [KKMS05] is non-proper and runs in 
time ?iP ol y( 1 / e ). This motivates the design of more efficient algorithms with potentially relaxed guarantees. 
HQS 111 give an "agnostic-type" algorithm, that guarantees dist(/i, /) < opt n ^+e and runs in time poly(n)- 
2 Poiy(i/e)_ In contrast 

we give an algorithm that is significantly more efficient, but has a relaxed error 

guarantee. 

Theorem 39. There is an algorithm B with the following performance guarantee: Let f be any Boolean 
function and let opt = dist(/, %). Given < e, 5 < 1/2 and access to independent uniform examples 
(x, f(x)), algorithm B outputs the (weights-based) representation of an LTF f* which with probability 1 — 5 
satisfies dist(/*, /) < 2~ n ( x/MVopt)) + £ _ The a i gorithm per forms 6(n 2 ) • (i/ e )0(i°g 2 (iA)) . \og{l/5) bit 
operations. 

Proof. We describe the algorithm B in tandem with a proof of correctness. We start by estimating each 
Chow parameter of / (using the random labeled examples) to accuracy 0(n(e) / 'y/n); we thus compute a 
vector a 6 R™ +1 that satisfies A := ||<3 — Xf\\ < K ( e )- We then run algorithm ChowReconstruct (from 
Theorem IT0T> on input a. The algorithm runs in time poly(l/A) • 0(n 2 ) • log(l/<5) and outputs an LBF g 
such that with probability at least 1 — 5 we have c?chow(/> <?) < 6A < 6n(e). By assumption, there exists 
an LTF h* such that dist(/i*, /) < opt. By Fact [6] we get <i C how(^*, /) < 2^/opt. An application of the 
triangle inequality now gives dchow{g, h*) < 2-^/opt + 4K(e). By Theorem|7J we thus obtain dist(g, h*) < 
2 -fi( Viog(Vopt)) + e / 2 . Writing the LBF g as g(x) = Pi{v + E"=i v i x i)> we similarly have that f*{x) = 
sign(u + Ya=i v i x i) nas dist(/, /*) < 2 _n(i v /log(1/opt)) + e. It is easy to see that the running time is 
dominated by the second step and the proof of Theorem |39]is complete. □ 

9 Conclusions and Open Problems 

The problem of reconstructing a linear threshold function (exactly or approximately) from (exact or ap- 
proximate values of) its degree-0 and degree- 1 Fourier coefficients arises in various contexts and has been 
considered by researchers in electrical engineering, game theory, social choice and learning. In this pa- 
per, we gave an algorithm that reconstructs an e-approximate LTF (in Hamming distance) and runs in time 
0(n 2 ) ■ (l/e)°( log ( 1 / £ )), improving the only previous provably efficient algorithm HOS111 by nearly two 
exponentials (as a function of e). Our algorithm yields the existence of nearly-optimal integer weight ap- 
proximations for LTFs and gives significantly faster algorithms for several problems in learning theory. 
We now list some interesting open problems: 

• What is the complexity of the exact Chow parameters problem? The problem is easily seen to lie in 
NP PP , and we are not aware of a better upper bound. We believe that the problem is intractable; in 
fact, we conjecture it is PP-hard. 

• Is there an FPTAS for the problem, i.e. an algorithm running in poly (n / e) time? (Note that this would 
be best possible, assuming that the exact problem is intractable; in this sense our attained upper bound 
is close to optimal.) We believe so; in fact, we showed this is the case for poly(n) integer weight LTFs. 
(Note however that the arguments of Section 17721 imply that our algorithm does not run in poly(ra/e) 
time for general LTFs, and indeed imply that no algorithm that outputs a poly (n/e) -weight LTF can 
succeed for this problem.) 

• What is the optimal bound in Theorem [7]? Any improvement would yield an improved running time 
for our algorithm. 
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• Our algorithmic approach is quite general. As was shown in HFell2l . this approach can also be used 
to learn small-weight low-degree PTFs. In addition, essentially the same algorithm was more recently 
used HDDS 121 to solve a problem in social choice theoiy. Are there any other applications of our 
boosting-based approach? 

• Does our structural result generalize to degree- d PTFs? A natural generalization of Chows theorem 
holds in this setting; more precisely, Bmck HBru901 has shown that the Fourier coefcients of degree at 
most d uniquely specify any degree-d PTF within the space of all Boolean or even bounded functions. 
Is there a "robust version" of Bruck's theorem? We consider this to be a challenging open problem. 
(Note that our algorithmic machinery generalizes straightforwardly to this setting, hence a robust such 
result would immediately yield an efficient algorithm in this generalized setting.) 
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A Near-Optimality of Lemma |22] 

The following lemma shows that in any statement like Lemma[22]in which the hyperplane H' passes through 
all the points in S, the distance bound on (3 can be no larger than n~ x l 2 as a function of n. This implies that 
the result obtained by taking k = l/2 n+1 in Lemma l22l which gives a distance bound of n~^ 1 / 2+ °^ as a 
function of n, is optimal up to the o(l) in the exponent. 

Lemma 40. Fix e > Sn" 1 / 2 . There is a hyperplane Hel™ and a set S C {—1, l} n such that \S\ > |2 n 
and the following properties both hold: 

• For every x £ S we have d(x, H) < 2en -1 / 2 ; and 

• There is no hyperplane H' which passes through all the points in S. 
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Proof. Without loss of generality, let us assume K = 4/e 2 is an even integer; note that by assumption 
K < n/2. Now let us define the hyperplane H by 

H = L G R n : {x x + . . . + x K ) + 2 ^K+i + ---+^n) = Q 
I [n-K) 



Let us define S = {x G {—1, 1}™ : d(x, H) < 4/ K(n — K)}. It is easy to verify that every x G S 
indeed satisfies d(x, H) < 2en -1//2 as claimed. Next, let us define A as follows: 

A = {x G {-1, l} n : xi + . . . + x K = 

and 

\xk+i + • • • + x n \ < 2\/n - K}. 
It is easy to observe that i C 5. Also, we have 

Pr^,...,^! + ... + x K = 0}> (2VK)- 1 

and 

Pr XK+u ... >Xn [\x K+1 + ... + x n \< 2Vn - K] > 1/2. 
Hence we have that \S\ > e2 n /8. We also observe that the point z G { — 1, l} n defined as 

2;:= (1,1,1,-1,... ,1,-1,-1,. ..,-1) (5) 

V v ' 

K-2 

(whose first two coordinates are 1, next K — 2 coordinates alternate between 1 and —1, and final n — K 
coordinates are —1) lies on H and hence z G S. 

We next claim that the dimension of the affine span of the points in A U z is n. This obviously implies 
that there is no hyperplane which passes through all points in A U z, and hence no hyperplane which passes 
through all points in S. Thus to prove the lemma it remains only to prove the following claim: 

Claim 41. The dimension of the affine span of the elements ofAUz is n. 

To prove the claim, we observe that if we let Y denote the affine span of elements in Au z and Y' denote 
the linear space underlying Y, then it suffices to show that the dimension of Y' is n. Each element of Y' is 
obtained as the difference of two elements in Y. 

First, let y G {—1, l} n be such that 

y* = Yl f* = °- 

i<K K+l<i<n 

Let y® 1 G {— 1, l} n be obtained from y by flipping the i-th bit. For each i G {K + 1, . . . , n} we have that 
y and y® 1 are both in A, so subtracting the two elements, we get that the basis vector e\ belongs to Y 1 for 
each i G {K + 1, . . . , n}. 

Next, let i ^ j < K be positions such that y» = 1 and yj = —1. Let denote the vector which is the 
same as y except that the signs are flipped at coordinates i and j. Since y u belongs to A, by subtracting y 
from y iJ we get that for every vector (i 7^ j < K) which has 1 in coordinate i, — 1 in coordinate j, and 
elsewhere, the vector ejj belongs to Y'. 

The previous two paragraphs are easily seen to imply that the linear space Y' contains all vectors x G M. n 
that satisfy the condition x\ + - ■ -+xk = 0. Thus to show that the dimension of Y' is n, it suffices to exhibit 
any vector in Y' that does not satisfy this condition. But it is easy to see that the vector y — z (where z is 
defined in ©) is such a vector. This concludes the proof of the claim and of Lemma l40l □ 
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B Useful variants of Goldberg's theorems 



For technical reasons we require an extension of Theorem |2T](Theorem 3 of HG0IO6I ) which roughly speak- 
ing is as follows: the hypothesis is that not only does the set S C { — 1, l} n lie close to hyperplane H but so 
also does a (small) set R of points in {0, l} n ; and the conclusion is that not only does "almost all" of S (the 
subset S*) lie on H' but so also does all of R. To obtain this extension we need a corresponding extension of 
an earlier result of Goldberg (Theorem 2 of IIG0IO6I ). which he uses to prove his Theorem 3; similar to our 
extension of Theorem l2~Tlour extension of Theorem 2 of HG0IO6I deals with points from both {—1, l} n and 
{0, l} n . The simplest approach we have found to obtain our desired extension of Theorem 2 of HG0IO6I uses 
the "Zeroth Inverse Theorem" of Tao and Vu HTV091 . We begin with a useful definition from their paper: 

Definition 42. Given a vector w = (u>i, . . . , Wk) of real values, the cube S(w) is the subset of R defined as 



The "Zeroth Inverse Theorem" of HTV091 is as follows: 

Theorem 43. Suppose w G R n , d G ~Nand9 G R satisfy Pr ie ;_ 1 nn [w-x = 9] > 2~ d ~ 1 . Then there exists 
a d-element subset A = . . . , id} C [n] such that for v = (w^ , . . . , Wi d ) we have {w\, . . . , w n } C S(v). 

For convenience of the reader, we include the proof here. 

Proof of Theorem \43\ Towards a contradiction, assume that there is no v = (it%, . . . ,Wi d ) such that {w\, 
■ ■ ■ 1 w n } C S(v). Then an obvious greedy argument shows that there are distinct integers ii, . . . , id+i G [n\ 
such that lOjj, . . . , w>i d+1 is dissociated, i.e. there does not exist j G [n] and e- L G { — 1, 0, 1} such that 

Let v = (wi-L , . . . , Wi d+1 ). By an averaging argument, it is easy to see that if Pr^gr-inn [w • x = 9] > 
2~ d ~ 1 , then 3v G R such that Pr^gr nya+i [v ■ x = u] > 2~ d ~ 1 . By the pigeon hole principle, this means 
that there exist x,y G {—1, l}^ 1 such that x 7^ y and v • ((x — y)/2) = 0. Since entries of (x — y) /2 are 
in {—1,0, 1}, and not all the entries in (x — y)/2 are zero, this means that v is not dissociated resulting in a 
contradiction. □ 

Armed with this result, we now prove the extension of Goldberg's Theorem 2 that we will need later: 

Theorem 44. Let w G W 1 have \\w\\2 = 1 and let 9 G R be such that Pr^gj^ !}^ [w ■ x = 9] = a. Let H 
denote the hyperplane H = {x G W 1 \ w ■ x = 9}. Suppose that span(H n ({-1, 1}™ U {0, 1}™)) = H, 
i.e. the affine span of the points in { — 1, l} n U {0, l} n that lie on H is H. Then all entries of w are integer 
multiples of f(n, a)~ 1 , where 



Proof. We first observe that w ■ (x — y) = for any two points x, y that both lie on H. Consider the system 
of homogeneous linear equations in variables , . . . , w' n defined by 



Since span(H n ({ — 1, l} n U {0, l} n )) is by assumption the entire hyperplane H, the system © must have 
rank n — 1; in other words, every solution w' that satisfies © must be some rescaling w' = cw of the vector 
w defining H. 

2 In ITV09I the cube is defined only allowing £j 6 {—1, 1} but this is a typographical error; their proof uses the e, G { — 1, 0, 1} 
version that we state. 




f{n,a) < (2n)L 1 °s( 1 /«)J+3/2 . Qlog(l/ a )J)! 



w'-(x-y) = for all x, y G H n ({-1, l} n U {0, l} n ). 



(6) 
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Let A denote a subset of n — 1 of the equations comprising © which has rank n — 1 (so any solution to 
A must be a vector w' = cw as described above). We note that each coefficient in each equation of A lies 
in {—2, — 1, 0, 1, 2}. Let us define d = [log(l/a)J +1. By Theorem |43j there is some , . . . , Wi , with 

def 

d! < d such that for v = (w^ , . . . , Wi ,), we have {w\, . . . , w n } C S(v); in other words, for all j G [n] we 

have Wj = Yli=i e £,j w ie where each eej belongs to {—1,0, 1}. Substituting these relations into the system 
A, we get a new system of homogenous linear equations, of rank d! — 1, in the variables w'^ , . . . , w[ f , where 
all coefficients of all variables in all equations of the system are integers of magnitude at most 2n. 

Let M denote a subset of d! — 1 equations from this new system which has rank d! — 1. In other words, 
viewing M as a d' x (cf — 1) matrix, we have the equation M ■ v T = where all entries in the matrix 
M are integers in [— 2n, 2n]. Note that at least one of the values , . . . , , is non-zero (for if all of 
them were 0, then since {w\, . . . , w n } C it would have to be the case that w\ = ■ ■ ■ = w n = 0.). 
Without loss of generality we may suppose that has the largest magnitude among , ■ ■ ■ , Wi , . We 
now fix the scaling constant c, where w' = cw, to be such that w'^ = 1. Rearranging the system M(cv) T = 
M(l, w[ 2 , . . . , w[ r ) T = 0, we get a new system of d! — 1 linear equations M'(w' i2 , . . . , «^ ; ) T = b where 
M' is a ((f — 1) x (d' — 1) matrix whose entries are integers in [— 2n, 2n] and b is a vector whose entries 
are integers in [— 2n, 2n\. 

We now use Cramer's rule to solve the system 

M'«,...,^ d ,) T = 6. 

This gives us that w^. = det(Mj)/ det(M') where Mj is the matrix obtained by replacing the j th column 
of M' by b. So each w\. is an integer multiple of 1/ det(M') and is bounded by 1 (by our earlier assumption 
about Wi x having the largest magnitude). Since {w[, . . . , w' n } C S(v), we get that each value w\ is an integer 
multiple of 1/ det(M'), and each \ w'^\ < n. Finally, since M' is a (df— 1) x {d! — 1) matrix where every entry 
is an integer of magnitude at most 2n, we have that | det(M')| < (2n) d '" 1 • (d' - 1)! < (2n) d ~ 1 • (d - 1)!. 
Moreover, the £2 norm of the vector w' is bounded by n 3 / 2 . So renormalizing (dividing by c) to obtain the 
unit vector w back from w' = cw, we see that every entry of w is an integer multiple of 1/N, where N is a 
quantity at most [2n) d+1 / 2 ■ d\. Recalling that d = [log(l/a)J + 1, the theorem is proved. □ 

We next prove the extension of Theorem 3 from HG0IO6I that we require. The proof is almost identical 
to the proof in [G0IO6 1 except for the use of Theorem @4]instead of Theorem 2 from HG0IO6I and a few other 
syntactic changes. For the sake of clarity and completeness, we give the complete proof here. 

Theorem 45. Given any hyperplane H in W 1 whose ^-neighborhood contains a subset S of vertices of 
{ — 1, 1}™ where S = a ■ 2 n , there exists a hyperplane which passes through all the points of ({ — 1, l} n U 
{0, l} n ) that are contained in the j3-neighborhood o/H provided that 

< p < ((2/a) • n 5+ L lo s( n / Q )J • (2 + Llog(n/a)J )l) _1 . 

Before giving the proof, we note that the hypothesis of our theorem is the same as the hypothesis of 
Theorem 3 of [Gol06 |. The only difference in the conclusion is that while Goldberg proves that all points 
of { — 1, l} n in the /3-neighborhood of H lie on the new hyperplane, we prove this for all the points of 
({-1, l} n U {0, 1}™) in the /3-neighborhood of H. 

Proof Let H = {x \ w • x - t = 0} with = 1. Also, let S = {i£ {-1, l} n | d(x,tt) < /3} and 
S' = {x g ({-1, l} n U {0, 1}") I d(x, H) < p}. For any x G 5' we have that w ■ x G [t - p, t + P}. 
Following MG0IO6I we create a new weight vector w' G W l by rounding each coordinate Wi of w to the 
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nearest integer multiple of (3 (rounding up in case of a tie). Since every x G S' has entries from { — 1, 0, 1}, 
we can deduce that for any x G S' , we have 

t- 13- n(3/2 <wx- n/3/2 < w' ■ x < w ■ x + n/3/2 < t + f3 + n/3/2. 

Thus for every x G 5', the value w' ■ x lies in a semi-open interval of length (3(n + 2); moreover, since it 
only takes values which are integer multiples of (3, there are at most n + 2 possible values that w' • x can take 
for x G S'. Since S C S' and |5| > a2 n , there must be at least one value if G (t - n/3/2 - (3, t + n/3/2 + /3] 
such that at least a2 n / (n + 2) points in S lie on the hyperplane Hi defined as Hi = {x : w' ■ x = t'}. We 
also let A\ = spanjx G S' : w' ■ x = t'}. It is clear that Ai C Hi. Also, since at least a2 n / (n + 2) points 
of {-1, l} n lie on A\, by FactQUwe get that dim(^i) > n - log(n + 2) - log(l/a). 

It is easy to see that \\w' — w\\ < -^/n/3/2, which implies that \\w'\\ > 1 — y/n/3/2. Note that for 
any x G S' we have \w' ■ x — t'\ < (n + 2)/3. Recalling Fact|20l we get that for any x G S' we have 
d(x,Hi) < {/3(n + 2))/(l - Jnf3/2). Since < 1, we get that d(x,Hi) < 2n/3 for every x G 5". 

At this point our plan for the rest of the proof of Theorem 53 is as follows: First we will construct a 
hyperplane H^ (by an inductive construction) such that span(Hfc n ({ — 1, l} n U {0, l} n )) = H^, A\ C H^, 
and all points in S' are very close to (say within Euclidean distance 7). Then we will apply Theorem 1441 
to conclude that any point { — 1, 1}™ U {0, l} n which is not on H^ must have Euclidean distance at least 
some 7' from H&. If 7' > 7 then we can infer that every point in 5' lies on H&, which proves the theorem. 
We now describe the construction that gives H^ . 

If dim(j4i) =n — l, then we let k = 1 and stop the process, since as desired we have span(H& n 
({-1, l} n U {0, l} n )) = H fc , A x = H k , and d(x, H fc ) < 2n/3 for every x G S'. Othewise, by an inductive 
hypothesis, we may assume that for some j > 1 we have an affme space Aj and a hyperplane Hj such that 

• Ai C Aj C Hj, 

• dim(Aj) = dim(74i) + j — 1, and 

• for all x G S' we have d(x, Hj) < 2 J n(3. 

Using this inductive hypothesis, we will construct an affine space Aj + i and a hyperplane H J+ i such 
that A\ C Aj + i C Hj + i, dim(Aj + i) = dim(^4i) + j, and for all x G 5' we have 

d{x,tt j+1 ) < 2 j+1 n(3. 

If Aj + \ = Hj + i, we stop the process, else we continue. 

We now describe the inductive construction. Since Aj C Hj, there must exist an affine subspace Aj 
such that Aj C A'- C Hj and dmi{A'-) = n — 2. Let Xj denote arg max x£ s' d{x,A'-). (We assume that 
max xe s' d(x, A'j) > 0; if not, then choose Xj to be an arbitrary point in { — 1, 1}™ not lying on A'j. In this 
case, the properties of the inductive construction will trivially hold.) Define Hj + i = span(^- U Xj). It is 
clear that Hj + i is a hyperplane. We claim that for x G S' we have 

d(x, Hj+i) < d(x, Hj) + d(xj, Hj) < 2 j nf3 + 2 j nf3 = 2 j+1 n(3. 

To see this, observe that without loss of generality we may assume that Hj passes through the origin and thus 
A'- is a linear subspace. Thus we have that || < ||(a;j)_LA' II f° r an x G S', where for a point z G W 1 

we write z\m to denote the component of x orthogonal to A'-. Let r = ||a; 1 ai \\ and r\ = \\xj \_a> II, where 
r\ > r. Let 9 denote the angle that x^a!. makes with Hj and let (p denote the angle that xj_ a i makes with 
( x j)±A'.- Then it is easy to see that d(x, Hj + i) = \r ■ sin(6 — <j>)\, d{x, Hj) = \r • sin(0)| and d(xj, Hj) = 
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|n • sin(0)|. Thus, we only need to check that if r\ > r, then \r • sin(6> — (j))\ < \r ■ sm(9)\ + \r\ ■ sin(</>)| 
which is straightforward to check. 

Let Aj + i = span(Aj U Xj) and note that A\ C Aj+i C H J+1 and dim(A, +1 ) = dim(A,) + 1. As 
shown above, for all x G 5' we have d(x, Hj+i) < 2- 7+1 n/3. This completes the inductive construction. 

Since dim(Ai) > n — log(n + 2) — log(l/a), the process must terminate for some k < log(n + 2) + 
log(l/a). When the process terminates, we have a hyperplane Hjt satisfying the following properties: 

• span(H fc n ({-1, l} n U {0, 1}™)) = H fc ; and 

• |H fc n5| > a2 n /(n + 2);and 

• for all x E S' we have d(x, H fe ) < 2 k nf3 < (l/a)n(n + 2)/3. 

We can now apply Theorem 1441 to the hyperplane to get that if H^. = {x \ v ■ x — v = 0} with \\v\\ = 1, 
then all the entries of v are integral multiples of a quantity E^ 1 where 

E < (2n)L lo s((™+ 2 )/ a )J+ 3 / 2 . ([log((n + 2)/a)J)!. 

Consequently v • x is an integral multiple of E~ x for every x G ({—1, l} n U {0, l} n ). Since there are 
points of { — 1, l} n on H^, it must be the case that v is also an integral multiple of E. So if any x S 
({—1, l} n U {0, l} n ) is such that d(x, H^) < E, then d(x, H^) = and hence x actually lies on H*.. Now 
recall that for any x G 5' we have Hfe) < (n/a)(n + 2)/3. Our upper bound on /3 from the theorem 
statement ensures that (n/a)(n + 2)j3 < E" 1 , and consequently every x G S' must lie on Hfe, proving the 
theorem. □ 
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